Unprecedented Ease of Putting Models Into Production
The last mile of data science is now easy
A new approach: Database Deployment
It shouldn’t be hard for data scientists to put models into production. With Database Deployment:
- Put your models into production with 1 line of code
- Deploy your model as an intelligent database table; when new data is inserted, predictions are automatically generated by, and stored in, the same database table
- Permission, track, and scale model deployment with SQL, no new infrastructure required
- Tie your data to your model, no network latency or extra governance
Simplify Model Governance at Scale
Use Time Travel to View Data Lineage
- Splice Machine keeps prior versions of rows, and allows you to query the database at some point in the past when the data was in a different state. INSERT, UPDATE and DELETE operations all create new versions of rows.
- Time travel allows tables to be queried at any point in a configurable time horizon regardless of how many times a row is inserted, deleted and/or updated.
- View models back to their roots by recreating the training data as it was at the time of training. Query tables at past timestamps to gain insights into model drift and deterioration. Utilize what-if capabilities to analyze how models would perform at various points in time
Accelerate Machine Learning with a Feature Store
Enterprise-scale feature stores enable sharing and collaboration of features and eliminates duplicate work
Cut Data Engineering Efforts by 80% with Shareable Features
- Avoid repeating feature engineering, create features once and store them in a central repository, accessible to everyone on the team: a Feature Store
- Automatically keep features up to date, and ensure that models are trained on the best available data
- Easily ensure Point-in-Time consistency, dramatically reducing data leakage
Automatically Updates Features in Real-Time
- Keep your models trained on the most up-to-date set of features
- Splice Machine’s feature store is uniquely built on a relational database to ensure stale data isn’t being used by tracking when features are updated and what they were updated to
Point-in-Time Consistency
- Utilize database triggers to execute arbitrary SQL, Java or Python on an event-driven basis
- Keep all of your real-time features up to date without human intervention
Easily Track Your Experiments with Integrated MLflow
Metadata management and experiment tracking enables teams to stay coordinated on an on-going basis
Collaborate with Colleagues and Track Experiments
- Track all of your experimentation efforts: Store and share parameters, metrics, artifacts, and even models with a simple and intuitive API
- Seamlessly share results with colleagues through the industry standard MLflow UI
- Store every aspect of the modeling process in a durable, scalable, persistent database Store your experiments and models right next to the data that created them
Slash Programming with Automatic Logging
- Log your entire Spark Pipeline, or all of your feature transformations from end to end, all with 1 line of code
- Use MLflow’s autolog functionality for tracking your Keras models without having to think about anything
- Focus on development, we’ll handle the tracking
Build Models Faster with Enhanced Jupyter Notebooks
Best-in-class developer environment lets teams collaborate and use any framework
Program In Any Language, Visualize Like a Pro
- Seamlessly sharing variables between SQL, Python, Scala, R and more in a single notebook
- Utilize powerful libraries like D3.js and BeakerX’s TableDisplay for real-time interactive demos and tools. Let visualizations and demos be your accelerators, not your bottlenecks
Build and Manage Models in Integrated Libraries
- Spark, H2O, Keras, Tensorflow, Spark MLlib, and SciKit built in
- Painless versioning: With one JupyterHub instance managing all of your Jupyter environments, you can keep all of your libraries in sync
- Automatic package governance with enhanced MLflow: models are logged with the package and Python versions used
Scale Infinitely with the Native Spark DataSource
- No limit to the size of your DataFrame
- Access your data instantly with our Native Spark DataSource: no serialization or JDBC/ODBC needed to access your data
Simple Sandbox Environments
Easily Build Sandboxes
- Splice Machine Ops Center allows for a comprehensive data platform to be deployed to virtually any infrastructure in a handful of minutes
- This capability eliminates the friction between data platform administrators and data scientists, guaranteeing that no production workloads are impacted while maintaining identical development environments
Work in JupyterLab and JupyterHub Workspaces
- Create dedicated workspaces for each data scientist with straightforward collaboration through MLflow
- Integrate any open source tools like Github through Jupyter Notebook and Lab extensions
- Customize each environment to the unique preferences of each teammate, and maintain those environments using the integrated Conda management system
Watch Our ML Manager Demo
Check out our webinar hosted by Splice Machine’s Ben Epstein