fbpx

Apache Derby: Why and How Splice Machine Uses It

September 15, 2015
Apache Derby

HBase-Scale-Out

 

Why do we use Apache Derby?

With over 15 years of development, Apache Derby is a Java-based SQL database. Splice Machine chose Derby because it is a full-featured ANSI SQL database, lightweight (<3MB) and embeds nicely into the HBase/Hadoop stack.  Apache Derby is an open-sourced database which Splice Machine has significantly modified to use the distributed Apache HBase as its storage layer.

Splice Machine takes advantage of these features within Derby:

  • 100% Java ANSI SQL RDBMS – CLI, ODBC/JDBC Compliant, embedded
  • Modular, Lightweight, Unicode
  • Authentication and Authorization built in (LDAP, column-level security)
  • Concurrency

What is Apache Derby?

Apache Derby is an open source relational database implemented entirely in Java. Apache Derby originally started as Cloudbase, a proprietary database, in 1996 and then was acquired by Informix and IBM. IBM open sourced Cloudbase as Apache Derby in 2004. Apache Derby has been an active Apache project with continuous development ever since. The product has a strong IBM DB2 influence, with similar syntax and features. Apache Derby also ships as Java DB in the officially supported Java JDK.

Benefits of Apache Derby include:

  • Lightweight – approximately 2.6 megabytes for the base engine and JDBC driver
  • Meets Java, JDBC and SQL standards
  • Provides a embedded JDBC driver that lets you embed Derby in any Java-based solution

Some advanced features of Apache Derby include:

  • Java Stored Procedures (possible for PL SQL can be converted to stored procedures)
  • Triggers
  • Updatable SQL Views
  • Encryption
  • Custom Functions – create your own in Java using any Java library to manipulate your data

How does Splice Machine work with Apache Derby?

The Splice Machine database works like most any other SQL database. Below is an example of how you may use Splice Machine to process a statement utilizing Apache Derby.

The user of Splice Machine uses a prepared statement, for example (ps =- conn.prepareStatement(“SELECT * FROM T WHERE ID = ?”);

  1. Look up in cache using exact text match (skip to #6 if plan found in cache)
  2. Parse using JavaCC generated parser
  3. Bind to dictionary, acquire types
  4. Optimize Plan
  5. Generate code for plan
  6. Create instance of plan

What is happening when you are going through this is that the database is going through a 4-phase plan:

  1. Parse Phase
  • Form an explicit tree of query nodes representing the statement
  1. Generate Phase
  • Generate Java byte code (an Activation) directly into an in-memory byte array
  • Loaded with special ClassLoader that loads bytes from the byte array
  • Binds arguments to proper types
  1. Optimize Phase
  • Determine feasible join strategies
  • Optimize based on cost elements
  1. Execute Phase
  • Instantiates arguments to represent specific statement state
  • Expressions are methods on Activation
  • Trees of ResultSets generated that represent the state of the query

Splice Machine Modifications to Apache Derby:

Splice Machine has made a number of changes to Derby to support our RDBMS on Hadoop. A number of these changes are listed below:

Apache Derby Component Apache Derby Splice Machine Version
Store Block file-based HBase Tables
Indexes B-Tree Dense Index in HBase Table
Concurrency Lock-based, Aries MVCC-Snapshot Isolation
Project-Restrict Plan Predicates on centralized file scanner Predicates pushed to shards and locally applied
Aggregation Plan Aggregation serially computed Aggregations pushed to shards and spliced together
Join Plan Centralized Hash and NIJ chosen by optimizer Distributed Broadcast, Sort-merge, Merge, Nested Loop Joins and Batch Nested Loop Joins chosen by optimizer
Resource Management Number of Connections and Memory Limitations Task Resource Queues and Write Governor

Related Content:

White Paper: Learn more about how Splice Machine with our White Paper

Video: Using HBase Coprocessors to Build a Distributed, Transactional RDBMS

Blog: Find out more about our use of HBase

* Source: Derby http://db.apache.org/derby/