Webinar Recap: Accelerate ROI on Your Data Lake Investment

October 8, 2018

Building a sophisticated date lake used to be an extremely complex and time-consuming process. While many strides have been made—particularly in the implementation process—steep challenges still remain. Chief among them is ROI. In fact, although 9 out of 10 large organizations have started on their data lake journey, many are struggling to generate value out of their significant investments.

Splice Machine, provider of a leading data platform to power intelligent applications, recently held a webinar with Diyotta to discuss how they’re helping large companies like Sonoco drive measurable results that can deliver ROI from a data lake. The webinar featured Krishnan Parasuraman, VP of Sales and Business Development at Splice Machine, Ravindra Punuru, CTO at Diyotta, and guest speaker, Brent Snyder, Senior Data Architect at Sonoco.

Data Lakes: The Promise … and the Reality

The allure of data lakes is due in large part to their potential and promise. Many companies expect their data lake to be a key component as they look to become an intelligent, AI-driven enterprise that can achieve true digital transformation: harness data to discover insights, stream data in real-time and off-load data marts, all while having the ability to scale and reduce costs.

In reality, data lakes all too often fall short of these expectations for a number of reasons. These include:

  • Data lake projects take too long to deliver:
    • Acquisition from source systems is cumbersome
    • Multiple storage and compute engines need to be glued or duct-taped together
    • Change data capture (applying incremental changes) is difficult
  • Business user expectations are not met:
    • Multiple consumption paths. Too many to tools to learn
    • Unable to operationalize insights (AI / ML)
    • Applications don’t work seamlessly against the data lake

How Do You Simplify the Architecture?

One of the key challenges with data lakes lies in their potentially complex architecture. Often, companies must ‘duct-tape’ different engines together, leading to fragmented data consumption.

For example, let’s say a company is running a traditional Hadoop data lake with multiple nodes. Depending on their use case, they may also be running business intelligence, analytics, real-time/streaming apps, and so on. In order to interact with their data, they must also use another layer of tools (Hive, Spark, HBase, depending on the use case). As a result, this slows down access to their date lake, raises costs and requires their team to be familiar with additional tools.

Enter Splice Machine, a data platform that offers a simplified solution, combining a SQL RDBMS (OLTP), data warehouse (OLAP) and machine learning, all in one. Splice Machine can run on AWS, Microsoft Azure, Heroku or on premises.

If you’re running a business intelligence application, queries would pass through Splice Machine’s JDBC/ODBC service layer to one of the nodes, where it would then enter one of the computer nodes; they would then be processed by HBase or Spark (which are built in) depending on the transaction type. There are also several ways to access your data, including a native Spark DataSource, bulk import/export, and a machine learning API.

At the end of the day, this streamlined architecture greatly reduces complexity and the need to ‘duct tape’ various solutions together. Splice Machine also eliminates the need for staffing up on infrastructure engineers, cuts data movement by more than ten hours, and reduces infrastructure costs by at least 30 percent.

Building the Right Data Lake for Sonoco

Sonoco is a $5 billion packaging company specializing in paper and industrial products, consumer packaging, and protective solutions; they have multiple ERP systems and heavily rely on data to fulfill their orders. After a period of significant growth for the company, they knew they needed to evolve their digital transformation and adopt a new data platform.

Their vision was to have a faster, more robust compute and storage platform that could support transactional and BI workloads, real-time events, rich analytics and data science – all while being cost-effective. However, after a year of research, they were acutely aware of the steep challenges, ones that their small IT staff was not equipped to handle; these included the complexity of the Hadoop ecosystem, and the multiple tools and lower-level development required to support it.

A Solution from Diyotta and Splice Machine

Charlotte, North Carolina-based Diyotta offers a single integrated solution that better helps organizations access data across a variety of platforms, whether it’s at rest, streaming or on-premise.

Their platform is comprised of two main components: a single Controller and Agents; the Controller feeds instructions to the Agents, which sit atop various source systems and allow data to flow into source systems. Diyotta’s unique process allows data to be ingested very quickly; what would have taken days before can now only take hours.

When paired together with Splice Machine, the two platforms offer a variety of benefits:

  • Efficiently build, test, deploy and maintain dataflow pipelines using a drag-and-drop interface
  • Perform data ingestion using a native bulk loader approach or optional JDBC
  • Generate Splice Machine schema automatically based on the source system
  • Utilize Splice Machine’s functions and UDF’s to Diyotta’s UI and validate it during design time
  • Perform data transformation using Spark processing with Splice Machine native context

Driving ROI with Your Data Lake Investment

Splice Machine’s transactional and machine learning capabilities pair well with Diyotta’s platform. And as Diyotta’s demo shows, utilizing the data platform solution from the two platforms can allow you to accelerate your company’s time-to-market with your data lake initiative. Splice Machine and Diyotta also take the complexity out of moving data from a legacy database, operationalizing it and prepare it for consumption.

At Sonoco, their vision is turning into a reality. With the data lake solution from Splice Machine and Diyotta, they have a more cost-effective platform that removed the complexity and need for resource-intensive development out of the equation. Their solution also now has the ability to perform transactional and BI workloads, access data in real-time and perform sophisticated analytics. Most importantly, their data platform can scale as their company grows and they continue their journey of digital transformation.