A Herd of Elephants: The Proliferation of Hadoop Distributions

May 10, 2013

At the Strata Conference in February, EMC Greenplum and Intel debuted their own Hadoop distributions. That makes six major Hadoop distributions, when you include Cloudera, MapR, Hortonworks and IBM. While this certainly hints that Big Data is beginning to hit its stride, there’s been some debate around proprietary distributions of Hadoop and how successful they will be in the long term. Because the Splice SQL Engine is built on the Hadoop stack, we thought we’d chime in with some thoughts on the pros and cons of major players like EMC and Intel getting into the fray.

More distributions will likely lengthen the sales cycles because more choices tend to slow down buying decisions. For an organization considering their Big Data strategy, it now becomes not only just a question of whether or not to use Hadoop, but also of which Hadoop distribution. And that question extends to companies like Splice Machine who have to decide which distributions to support. However, this might be moot if the new distributions don’t get traction.

All of that said, Intel and EMC are very large, well-respected technology providers that will spend significant marketing dollars increasing overall market awareness. Their presence in the Hadoop market increases the legitimacy of a Hadoop solution for mainstream corporations. In addition, new competition should only serve to raise the bar and ultimately, make distributions better.

Netting this out, we believe that the new distributions are a net positive for the Hadoop market. The increased market awareness will more than compensate for any short-term confusion on what distribution to choose. For a more in-depth look at the new Hadoop landscape (that also stays neutral), check out this story from Andrew Brust in ZDNet.

We look forward to see how this all turns out. Do you have any thoughts? Leave them in the comments section below!