Extra! Read All About It – Our 1.0 is Here
November 19, 2014
Exiting public beta, Splice Machine is now generally available and offers new features for enterprise readiness, enhanced integration with the Hadoop ecosystem, and advanced analytical capabilities such as Window Functions.
We’re thrilled to announce the release of Splice Machine version 1.0. We’ve worked closely with several of our charter customers and partners, including Cloudera, MapR and Hortonworks, to determine the features needed to make our Hadoop RDBMS enterprise-ready.
We have added numerous important enhancements to version 1.0, including:
Splice Machine v1.0 delivers enterprise-level features required by most companies for production deployment:
- Native Backup and Recovery: Ensures business continuity and protects against user errors by providing support for a fully transaction-aware backup. This hot backup capability allows applications and workloads continued availability while the backup of the database is in progress. Splice Machine also provides support to restore a database in a similar transactionally-aware fashion from a backup.
- Authentication: Provides support for user authentication and supports FIPS-compliant password encryption algorithms, including SHA-512 (default). Splice Machine v1.0 also supports integration with the LDAP v3 standard, allowing users to be validated against an LDAP-supported directory service.
- Authorization: Allows database administrators (DBAs) to create new users for access to the database. In addition, DBAs can grant and revoke roles and privileges to users, allowing them to control read/write access at a table or column level.
- Parallel, Bulk Export: Facilitates the movement of query results into a spreadsheet. The new parallel, bulk export uses the power of the entire cluster and seamlessly take the results of a query to produce a Comma Separated Values (CSV) file, which can be displayed in all major spreadsheet applications.
- Upsert: Simplifies the data loading process by automatically inserting or updating data, as appropriate, from a file into a table. Import files often contain a combination of new and updated records. New records need to be loaded with INSERT statements, while updated records need to be loaded through UPDATE statements. However, it is often difficult, if not impossible, to determine which records are new vs. which have been updated in an import file. The upsert function dramatically simplifies the data loading process by automatically inserting or updating data, as appropriate, from a file into a table. This data load process uses the asynchronous write pipeline, described below, to parallelize writes across all nodes.
- Management Console: Produces the ability to view explain traces on queries on Splice Machine’s initial version of its management console. For a particular query, the explain trace shows the timing of each query operation, as well as the distribution of data.
Integration with Hadoop Ecosystem
Splice Machine v1.0 now supports native MapReduce and HCatalog integration, allowing interoperability with major Hadoop tools such as MapReduce, Hive, and Pig:
- Native MapReduce Integration: The MapReduce API in version 1.0 allows MapReduce programs to read from, or write to Splice Machine relational tables. This allows for batch and highly concurrent, transactional workloads to co-exist on the same cluster. For instance, you can run a MapReduce program to sift through clickstream data and store the aggregate number of clicks and impressions in a customer schema in Splice Machine.
- HCatalog Integration: Apache HCatalog, the table and storage management layer of Hadoop, provides a relational view of data in the Hadoop Distributed File System (HDFS). Customers can now view Splice Machine, Spark, and Hive tables in HCatalog, thus enabling users to query without knowing how their data is stored. For instance, you can use the Spark Machine Learning (SparkML) libraries to run machine-learning analytics on clickstream, advertising, and campaign data in Splice Machine and HDFS in order to build recommendation models.
Splice Machine v1.0 extends its SQL compliance to include SQL-2003 support for Window Functions, as well as certifying on major SQL benchmarks:
- Analytic Window Functions: These advanced functions provide capability for analytics, such as running totals, moving averages, and Top-N queries. They perform calculations across a set of table rows related to the current row in the window. Supported Window Functions include RANK(), DENSE_RANK(), ROWNUMBER(), as well as several other functions.
- Completed TPC-C and TPC-H Test Suites: As part of its performance and functional testing process, Splice Machine completed the TPC-C (transactional), and TPC-H (analytical) test suites. To the best of our knowledge, there are no other SQL-on-Hadoop solutions that can successfully complete both of these test suites without modifications.
Working closely with both queries and datasets from our customers, Splice Machine has made many performance improvements. There are too many to list here; however, we would like to highlight a few key improvements and milestones:
- Asynchronous Write Pipeline: By batching up writes to each server, Splice Machine was able to increase ingest rates by over 8 times, while ensuring indexes are updated transactionally.
- 1M rows/sec Ingest Rate Benchmark: Using data and queries from a digital marketing company, Splice Machine was able to ingest 1M rows/sec. This testing also demonstrated linear scaling as nodes and concurrent users were increased.
- Concurrency Benchmark: For another digital marketing company, Splice Machine ramped up to 3,000 concurrent users, generating 350K queries/sec across a variety of reads and writes, modeling a loyalty application. The ramp-up demonstrated linear scaling, while maintaining query responses consistently between 20-30ms.
Continued Support for all Hadoop Platform Vendors
In addition to new features, Splice Machine continues to support all of the major distributions of Hadoop. Latest versions supported by Splice Machine include Cloudera CDH 5.1, Hortonworks 2.1 and MapR 4.0.1.
For enterprise customers deploying Splice Machine v1.0, we offer the Safe Journey program, which is a proven methodology to migrate database workloads. It implements risk-mitigation best practices and leverages commercial tools that automate most of the PL/SQL conversion process.
This program encompasses comprehensive services and support, including training courses, a Kickstart Package to speed implementation, and on-demand consultants to optimize ongoing operations.
The standalone version of Splice Machine v1.0 is now available for download at https://splicemachine.com/download.
by Gaurav Kumar and Alyssa Jarrett