What Is Splice Machine?

Splice Machine is a scale-out SQL RDBMS with ACID transactions, in-memory analytics and in-database machine learning combined.

Our platform simplifies your architecture by seamlessly packaging a set of compute engines, saving you the time and expense of having to duct tape systems together yourself.

Low-Latency Row-Based Storage

To deliver low latency OLTP reads and writes, Splice Machine auto-shards row-based storage across region servers. Region servers consist of multiple regions that are records ordered by primary key to enable fast point lookups and range scans. The database also features consistent secondary indexes to support many access keys.

Splice Machine enables fast writes by writing to an in-memory buffer for speed, and logs writes in durable write-ahead-logs in case of region server failures. When the write buffer fills up, the data is flushed to disk as multiple redundant files for high availability.

Efficient Columnar Storage

Today’s modern, intelligent applications need to tap into not just up-to-second transactional data, but also query deep, historical data that unlocks trends, patterns and insights. Splice Machine’s dual model leverages columnar external tables on cost-effective storage on cloud block storage, HDFS or local files as Parquet, ORC or Avro files with append-only functionality.

First-class tables can be joined with low-latency row-based storage for hybrid computation. Splice Machine allows “INSERT INTO SELECT FROM ” SQL queries to create mini-data marts that enable point-queries and range scans against a large data warehouse on S3 or ADLS.

ACID Transactions

Splice Machine’s patented distributed transaction system maintains ACID (Atomicity, Consistency, Isolation, Durability) properties across multiple tables, records, constraints, and indexes, allowing Splice Machine to power critical applications.

Splice Machine uses a snapshot isolation design that uses Multiple Version Concurrency Control (MVCC) to create a new version of the record every time it is updated, instead of overwriting the old data value. This means that readers have access to the data that was available when they began reading, even if that data has been updated by a writer in the meantime.

With each transaction having its own virtual “snapshot”, transactions can execute concurrently without any locking. This leads to very high throughput and avoids troublesome deadlock conditions.

Powerful Analytical Computation

Splice Machine uses Apache Spark, a unified analytics engine for large-scale data processing, as its underlying analytical compute engine. Spark has very efficient distributed, in-memory processing that can spill to disk (instead of aborting the query) if the query processing exceeds available memory. Spark is also unique in its resilience to node failures, which may occur in a commodity cluster. Other in-memory technologies will abort all queries associated with a failed node, while Spark uses ancestry (as opposed to replicating data) to regenerate its computation on another node.

Splice Machine analytical computation maintains ACID properties with a special integration to our underlying row-based storage. Analytical queries requiring table scans generate Spark Dataframes by reading row-based storage files directly and merging them with any changes in write buffer that have not been flushed to disk. Splice Machine then uses the Spark Dataframes and Spark operators to distribute processing across Spark executors. This tight OLTP/OLAP integration enables fast HTAP (hybrid transactional/analytical processing) operations with no coding. It happens automatically.

Easy Migration

SQL Migration

Powerful tools to move from a legacy RDBMS like Oracle or IBM DB2, including full ANSI SQL and PL/SQL support.

BI Connectors

Connectors to standard BI tools such as Tableau, PowerBI and Microstrategy allow you to migrate the enterprise reports and dashboards that users rely on everyday without rewriting them.

ETL Connectors

Informatica and Attunity users can plug Splice Machine into ETL processes via connectors.

Cost-Based Optimization and Execution

The cost-based optimizer uses advanced statistics to choose the best compute engine, storage engine, index access, join order and join algorithm for each task. In this way, Splice Machine can concurrently process transactional and analytical workloads at scale without developers needing to build code to duct-tape engines together and manage computation.

Infrastructure Agnostic

Splice Machine can be deployed both on-premises and in the cloud on Amazon Web Services and Microsoft Azure. This gives enterprises the flexibility to scale as needed, and on the cloud provider of their choosing, thereby avoiding the lock-in associated with choosing databases exclusively available only on one vendor’s cloud platform. Plus, when deploying in the cloud, Splice Machine users have the option of consuming the platform as a service or on their own cloud environment.

In-Database Machine Learning

Integrated Jupyter Notebooks

Industry standard mechanism to rapidly develop and collaborate on data science solutions.

Industry-Leading Libraries

Access to the Spark MLLib and H20 Libraries, including deep learning TensorFlow integration, GLM, GBM, XGBoost, and AutoML

Model Workflow Management

MLflow enables data scientists to track their parameters, algorithms, features, versions and metrics of multiple experiments and model runs to become more productive resulting in more accurate models.

Seamless Deployment

MLflow packages models into Docker images, which are then deployable directly via Sagemaker and AzureML for implementation