Mastered Data Lakes: How You Can Simplify Data Pipelines and Run Mission-Critical Applications at Scale
September 12, 2018
In the fast-paced, competitive environment of real estate, data can be a company’s most valuable asset. This is certainly true for Ten-X, the largest online real estate market in the U.S. Over the past eleven years, the Company has had $53 billion in property sales and holds the Guinness World Record for the largest verified e-commerce transaction in history (for a cool $96 million).
Behind the scenes, Ten-X realized the value of data and put it to work to drive sales. “Data is really our lifeblood,” says Jeff Klagenberg, senior director of enterprise data management for Ten-X.
Its sophisticated data lake powers mission-critical pieces of its workflow, including bidding systems, recommendation engines, and 360-degree views of its properties. It also helps match buyers with sellers, “to make sure they both have the best information possible,” adds Klagenberg.
Building the Ideal Data Lake
Much like a water treatment plant, Ten-X’s current data pipeline first takes in raw data, then filters and cleanses it. Then, it conforms, matches and merges the data, and stores it canonically where it can be consumed by downstream systems later.
Through this systematic process, its data pipeline becomes much more simplified. Ten-X was already ahead of the curve with the structure and organization of its data lake.
Take this example in terms of data streams: let’s pretend they have 10 streams and 3 data destinations. In a traditional, point-to-point pipeline, this would give them 30 streams to manage. In Ten-X’s case, the Company has a mastered data lake with a hub and spoke pipeline. In keeping with our example of 10 streams and 3 destinations, this would give them a more manageable 13 streams. The latter structure allows the data lake to perform with efficiency and power, with significantly less errors.
The Need for Data Flexibility
But even with all the benefits of its beautiful, Hadoop-based data lake, Ten-X realized still had unmet needs. These included having referential integrity and online transactional processing (OLTP) support. It also lacked flexibility with its data. If a seller was trying to move quickly to close a deal, a Ten-X salesperson would need the most up-to-date prices and transaction history to be most effective at his job.
Ten-X wanted to keep things simple by keeping the scale-out capabilities of the Hadoop structure it had in place, without adding another complex layer to its data lake.
One Infrastructure to Rule Them All
Ten-X found its solution in Splice Machine. Having a solution that serves as a SQL RDBMS, data warehouse and machine learning platform in one was especially attractive. And since Splice Machine supports Hadoop, Ten-X already had the internal skill set in place for managing its environment.
Splice Machine also gave Ten-X the best of both worlds in terms of data access – it could call up the most up-to-date information from its data lake in real-time, while also referring to the mastered, historical data stored in Hadoop.
And the most significant benefits? Now, Ten-X had the Online Application Processing (OLAP) support it needed for its bidding, clickstream and third-party data, along with the ability to leverage predictive analytics with its data.
“We’re very happy with [Splice Machine] and see a very bright future for this type of architecture going forward,” adds Klagenberg.
Watch Splice Machine’s webinar with Ten-X on-demand below: