fbpx

The Secret to Real-Time ML: A Feature Store

How have billion-dollar businesses built on machine learning overcome the challenges of development, deployment and governance at scale? They devoted millions of dollars to people and technology to create a feature store.

A feature store is a centralized repository of continuously updated, raw and transformed data for machine learning. It enables better models to be created faster with reusable features and makes model and feature governance possible for explainability and transparency.

 

Machine Learning Is Stuck

Few businesses are deriving the full value of machine learning in their daily operations. It takes an army of data scientists to deploy ML throughout the enterprise because feature engineering remains too time consuming with mundane data tasks.

What Is Holding Machine Learning and AI Back?

  • Architectural Complexity
  • Onerous Feature Engineering
  • Hard to Manage and Govern
  • Unable to Scale Data Science
  • Lack of Lineage & Transparency

A More Powerful Feature Store

Most feature stores incur extra cost, complexity, latency because they need to maintain an online feature store and an offline feature store. Splice Machine is the only  single-engine feature store, powered by one ACID-compliant dual OLTP/OLAP RDBMS.

The Benefits of a Single-Engine Feature Store

  • Easier to provision and operate
  • Less infrastructure cost
  • Easier to backup or replicate
  • Native triggers enable event-driven pipelines
  • No latency to synchronize
  • True ACID transactionality

“We had this sort of a feature store at Airbnb, but it was limited by the fact that we were largely on HDFS. It enabled users to share features, but it didn’t solve the online/offline problem. But the solution can obviously be much more elegant if you start with a more amenable database that can function in realtime. Splice Machine seems to be doing exactly that – MLflow integration, database re-injection, Spark lazy loading, easy deployment, and API-less access.

– Robert Yi, CDO at Dataframe and former Airbnb data scientist

The Fastest Way to a Feature Store

splice-machine-deployment-options

As the provider of the only scale-out SQL RDBMS with built-in machine learning, Splice Machine has driven advancements that others did not think possible. Unlike other feature stores, the Splice Machine Feature Store is built on a single database. This delivers simplicity, scalability, and speed, both in implementation and operation.

By choosing the Splice Machine Feature Store over a single cloud option, companies can avoid cloud vendor lock-in – and retain the ability for on-premise hosting.

Key Capabilities of the Splice Machine Feature Store

  • Architectural simplicity
  • Horizontal scalability
  • Low latency lookups
  • Point in time consistency for training
  • ELT in-DB scalable transformation in both SQL and Python for feature pipelines
  • Event-driven and batch feature updates
  • SQL transforms
  • ACID compliance between “online” and “offline” data
  • Automatic feature history
  • SQL or Python feature retrieval

Need More Than a Feature Store?

Splice Machine also offers ML Manager, an end-to-end machine learning platform that solves the biggest problems requiring the most effort in the Machine Learning workflow.

mlflow-feature-comparison

For example, with Splice Machine ML Manager, you can:

  • Run a real-time model that could make predictions considering data in the moment
  • Deploy that model using the power of the database in one line of code
  • Schedule auto model retraining and champion challenger systems so the model is always trained on the most available data and becomes better over time
  • Track which features are being used in that model and how those features change over time
  • Create, reuse, and share new features for the existing model and automatically have those new features backfill to the history of the data
  • Define a training dataset for the model and have that training set update with new data automatically as it becomes available
  • Access the entirety of historical transactions wherever it is stored to train the model with