Five Reasons To Make Splice Machine Your New Database Engine
February 22, 2017
By: Mandar Pimpale
Modern applications are becoming increasingly data intensive, demanding a mixed workload of both OLAP and OLTP queries. Such web, mobile or e-commerce applications, more intelligent and complex in nature, call for the analysis of real time data and not of yesterday’s data. It is but a natural progression in the data management, and it is fast becoming a de facto standard, so much so that Gartner has started to track data-engines which support such mixed workload and categorizes them as HTAP, Hybrid Transactional Analytical Processing. Splice Machine is one of the top HTAP system in the market today, and here are five main reasons why you will want your applications to be backed by Splice Machine.
1) No ETL
Traditional database technologies have segregated OLTP and OLAP workload due to the difference in their inherent nature and in their query processing style. Because of this segregation the data had to be moved from OLTP to OLAP engines with ETL. However, these data pipelines can be complicated, and also make real time data analysis impossible. After OLTP data gets transformed and streamed into various data-engines, each of them have to be maintained and kept in sync, which is a difficult task and is prone to errors. Besides, it escalates the costs of the data-platform for these modern applications.
For example, consider an OLAP cube built using a traditional columnar database. Although it is a good fit for OLAP workload analysis, such system will have a hard time processing transactional data that comes in small increments at high volume and velocity. It has to start batching the ingestion at particular intervals, typically in hours, which in turn poses a big delay in real time data analysis, or it requires a different strategy to deal with this problem like splitting the fact data in multiple cubes, historic and current, and further complicating the logic of web or mobile applications.
Splice Machine resolves this issue by removing ETL and splicing OLTP and OLAP into one tight machine. This Splice Machine now handles the mixed workload and eliminates time loss and complications. Under one hood, it can process both row and columnar style workloads. It stores the transactional data in robust HBase, where row store provides a good key-value store for all transactional workloads, and it analyzes using the Spark jobs. In the latest release it also supports columnar compressed Apache Parquet and ORC format, which enables in-place columnar data analysis.
2) Old Faithful SQL
When NoSQL engines started emerging a few years ago, many of us got an impression that SQL was becoming old-school. However, we find that it still rocks our database world. This is mainly because of its rich set of capabilities and large amount of skill-set readily available in the market due to the fact that SQL has been around for decades. An HTAP system using ANSI-SQL therefore becomes simple and desirable. Towards that point, Splice engine supports standard ANSI SQL-99 with the rich set of window functions using the scale out architecture that powers petabytes of data on commodity hardware giving nice price-performance ratio.
3) ACID matters
Many databases allow fast transactions without ACID guarantee or with only partial ACID promise, which means that application logic has to be aware of this shortcoming and should handle the issues such as partial data writes or data inconsistency whenever it occurs. This makes the application development process slow and unreliable. The code logic begins to look like spaghetti and increases the chances of non-deterministic bugs, which turns out to be costly later in the cycle. Therefore, it is best to push down this aspect of atomic and durable transaction to the database and have in return flexibility, clean, error-proof code for the applications.
Splice Machine, besides offering full ACID compliance, supports a hierarchical distributed snapshot isolation system. It is also compatible with native Oracle PL/SQL syntax, so the PL/SQL workload can be easily offloaded without migration changes in the application logic.
4) Low Latency with High Concurrency
Splice Machine is a SQL on Hadoop database with upcoming support for DBasS in cloud. The benefit of HBase data store is that it can grow to many petabytes with fast access time. It has high-availability and auto-sharding characteristics with no down time and no data loss. Splice Machine is built to handle all kind of complex workload on large data-sets. Using a cost-based optimizer, Splice Machine can distribute mixed workloads on either Apache HBase or Apache Spark.
At a high level, the following happens when a query is issued: a query is parsed and an Abstract Syntax Tree (AST) is generated, the cost based optimizer evaluates various query access plans looking at the best join orders and cardinalities. During this phase it also does predicate-pushdown and unwinding of nested sub-queries, then it generates an optimal bytecode that would work against both HBase or Spark jobs. If the joins are simple, it chooses Apache HBase to run the queries against it which will leverage the HBase block cache and bloom filter. If large data-set joins are happening, it chooses to use an Apache Spark job using Yarn. Splice engine generates a smart RDD which allows fast access to HFiles and merges any deltas from memstore without causing any redundancy. On the workload resource management side, it uses Apache Spark fair scheduler which does not block large and small jobs and lets you see the job details both past and present in browser.
Unlike traditional databases which create a big backup of jobs when a huge query is running, Splice-engine’s workload isolation logic allows smooth running of small operational jobs as well as big analytical workload keeping response time fairly constant. This high concurrency is vital for operational applications.
5) Machine Learning
With the amount of data growing exponentially and the computational power increasing, the need to discover the pattern in new, incoming data is natural. The benefits of analyzing real-time data are becoming clearer every day. Businesses and government agencies need to be able to analyze the most current information when trying to detect crisis, fraud, or potentially illegal activity. Fast-paced changes in the financial markets may drive the personalized suggestions on a stockbroker’s website. All these possibilities are now open due to real time data analysis and AI-libraries like MLib and Tensorflow. AI algorithms can be easily embedded with Splice Machine. Supervised or unsupervised machine learning algorithms are iterative processes that will perform faster if they sit close to the data engine. With the help of stored procedures, Splice Machine integrates with the machine learning algorithms to make application more intelligent and valuable to their users.
Splice Machine is an open-source, dual-engine, HTAP database that incorporates the proven scalability of Hadoop, the standard ANSI SQL, ACID transactionality, and the in-memory performance of Spark. Armed with such engine, you can derive new and valuable business insights from your data, with a better price-performance ratio than any other database solutions.
About the author:
Mandar Pimpale is a Principal Software Engineer at Splice Machine. Prior to joining Splice Machine, he was a Principal Member of Technical Staff at Actian and held Software Engineering roles at ParAccel and IBM. When he isn’t building the next OLTP/OLAP database, he enjoys spending time with his family and playing the tabla (Indian drums).