How to Break Data Silos to Drive Enterprise-Wide AI
By Monte Zweben
It takes more than technology to successfully make machine learning the new internet
June 11, 2021
Not many people miss having to manually sort files, label papers, or search for lost forms in huge filing cabinets. That’s because all these tasks have become way easier, faster, and more enjoyable since they’ve become digitized – computers and the internet have revolutionized the way businesses approach organization and task management.
Similar to how computers and the internet made monotonous tasks faster and easier in every department, AI will transform work in every industry in the 21st century. Machine learning will automate away the most time-consuming and repetitive tasks across a company, along with offering predictions that will allow businesses to make better decisions ahead of time.
The Downfall of Data Silos
Introducing these revolutionary processes takes time and specialized knowledge. However, the way this is happening right now is more expensive and less effective than it truly has to be.
There’s an important business disconnect that is holding back the expected ROI for machine learning models. In most companies, the people who can manage large volumes of data and encode it in machine learning models are sitting in IT department silos with people who share their technical skillset. They are away from the action – they have only a vague idea of where the application will interact with its end user, whether that’s a customer, supplier, or employee. They are one step removed from the business, and less intimately acquainted with the most important business inputs and outcomes. The models they make reflect this, often collecting data for a variable that is not as predictive as another.
Additionally, much of the work data scientists are doing is unnecessarily repetitive. Data preparation takes 80% of the average data scientist’s time. They struggle to compute features which are the data attributes that their machine learning models use as input. Importantly, many data scientists throughout a company end up slogging through the data to calculate the same features that another data scientist in the company has already found. Not to mention the governance nightmare lurking underneath the messy data lineage that underlies new machine learning models!
Introducing: The Feature Store
All this is changing. A new technology called a feature store is providing a central location for all the data related to the machine learning lifecycle and its business benefits. By freeing up time that used to be dedicated to duplicative data prep and feature engineering, data scientists can get more models up and running with a better return.
A feature store is a central repository that stores features, data lineage, and metadata associated with all the machine learning models in a company. In essence, it is a single source of truth for all of the data science work within the organization. Being able to share and re-use features boosts data science productivity by cutting down duplicate work and making it easy for data engineers, data scientists, and ML engineers to collaborate, effectively making each machine learning model cheaper and easier to produce. If you want to learn more about why that is, there’s a more in-depth resource here.
Data Science Pods
Now that feature stores are housing all the relevant ML data, data workers can easily collaborate, even when they’re not in the same space. The automation of key parts of the ML process makes it so building each new model takes far less time and money.
Even though feature stores are incredibly powerful tools, they are ultimately still tools, which means how they’re used will influence how helpful they are. Even with a feature store bridging the gaps between data workers, a “siloed” data science structure makes it hard to truly gain the enterprise-wide benefits of AI.
The AI or data science team simply does not have enough knowledge about the business or the applications that will deploy the models to optimize production operations that deliver business outcomes. To overcome this, the secret sauce to a successful AI implementation is diversity. Data scientists need to work side by side with people who know the business and the application from inception to completion.
To truly experience the benefits AI is poised to bring, businesses need to implement a more integrated tech structure so the models they invest so highly in are measuring, predicting, and ultimately delivering the business outcomes that executives want. Overcoming the old IT silo structure, and instead piloting data science “pods” which are embedded in each department of a business is the best way to do this. Each and every branch of a company can now benefit from the insights and automation machine learning will provide.
With feature stores automating and centralizing key parts of the machine learning process, data scientists can focus on collaborating with different departments who can define key business inputs and outcomes so companies will receive the AI ROI they’ve been waiting for. Implementing an integrated data science “pod” organizational structure will revolutionize the returns machine learning models are able to offer to employees and shareholders alike. Feature stores are finally allowing for the democratization of machine learning in every part of a business, and twenty years from now, we won’t remember what it was like without them.
To learn more about how feature stores work and take advantage of the other benefits they can bring to your business, check out our feature store white paper.