Webinar: Streamlining the ETL Pipeline with Hadoop

May 20, 2015
on demand

ETL too slow or constantly breaking? Find out how to use Hadoop to fix your ETL process while avoiding the landmines.

Splice Machine, provider of the Hadoop RDBMS database for Big Data applications, invites you to watch our webinar: Streamlining the ETL Pipeline with Hadoop.

ETL has historically been viewed with willful disregard. Companies approached ETL with a set-it-and-forget-it mentality. Prior to Big Data this was OK, but as some organizations are discovering, that approach needs to change because the data pipeline itself is becoming more complex both in terms of sources and in terms of destinations for that data. By transforming the ETL process, organizations can improve data quality, data recency, and data availability. Furthermore, reducing the amount of time spent preparing data can increase analysts’ productivity and lead to more data-driven business decisions.

Companies have begun migrating ETL for high-value use cases to a scale-out architecture, such as Hadoop. Companies often start moving ETL processes to Hadoop when their scale-up ETL architecture starts “falling over.” Such a migration offers a number of benefits. Because it is scale out, Hadoop reduces costs. Companies can handle much larger volumes of data without experiencing a linear increase in cost.

While Hadoop addresses ETL’s pain points, it’s not a perfect solution. Any problem with the ETL process—whether due to logic failures, data changes, or data quality issues—requires a reload or restart of the entire process.

ETL on Hadoop

This is not uncommon and is why those who migrate ETL to Hadoop often find that the process is too brittle. Because Hadoop is a read-only file system, you can’t correct problems midstream. Furthermore, ETL on Hadoop can be more complicated than traditional ETL because it requires pulling together a variety of tools from the Hadoop open source ecosystem. While these are readily available via the open source community, organizations must understand what tools they need and have the skillsets to use them.

In this webinar, we take a closer look at ETL as well as the pain points that arise in the face of big data and how they can be addressed to improve the Big Data analytics process. Splice Machine’s Rich Reimer, VP of Marketing and Product Management, also discusses in detail how enterprises can improve data quality, data recency, and data availability by using a Hadoop RDBMS to fix your ETL.

View the recorded webinar now: