UHG
Search
Close this search box.

Are Feature Stores The Next Big Thing In Machine Learning?

Feature stores manage data pipelines that transform raw data to feature values.

Share

Feature story

I’m expecting 2021 to be the year of the feature store

Mike Del Balso, CEO and co-founder of Tecton

According to a Gartner study, 85 percent of AI projects will flatline by 2022. Even the most diligent machine learning models may not meet expectations when deployed in an enterprise setting, mainly due to two reasons — inadequate data infrastructure and talent scarcity.

In the machine learning pipeline, search for appropriate data and dataset preparation are among the most time-consuming processes. A data scientist spends around 80 percent of his/her time in managing and preparing data for analysis. The demand-supply gap for qualified data scientists is another pressing challenge.

Enter, feature store. 

What Are Feature Stores?

A feature store allows features to be registered, discovered, and used for the machine learning pipelines and online applications for model inferencing. They can store large volumes of feature data and provide low latency access to features for online applications. A feature store automates the input, tracks, and governs data into machine learning models. Enterprise AI can benefit immensely from such a centralised and reproducible framework to manage machine learning models.

In 2017, Uber changed the game with the introduction of Michelangelo, an ML platform for data management. Michelangelo offered a feature store. In 2019, Feast project, in collaboration with Google Cloud, announced a feature store.

The latest to join the bandwagon is Amazon’s AWS SageMaker Feature Store — a fully managed and purpose-built repository. Airbnb, Twitter, Facebook, and Netflix are other major players with feature stores.

Feature stores (by taking up the most mundane yet time-intensive data tasks) allow data scientists to focus on essential tasks such as model building and experimentation rather than spending time on cleaning and managing data.

Feature stores manage data pipelines that transform raw data to feature values. These pipelines can be either the scheduled pipelines that aggregate a large amount of data (petabytes) or real-time pipelines triggered by events. Feature stores contain the ‘freshest’ feature values to machine learning models.

Feature store exposes APIs and UIs to the data scientist to show the currently available features, pipelines and other training datasets available or are under development. Data scientists can choose the features required for their use cases and incorporate them into their models.

Feature stores offer the following benefits:

  • One of the main challenges in implementing a machine learning model in an enterprise environment is that the features used for training the model may not be the same in the production serving layer. A feature store provides a consistent feature set, enabling a smoother deployment process.
  • The feature store keeps metadata in addition to the actual features. This helps data scientists in selecting particular features that performed well on existing models.
  • Unlike traditional methods where features are developed in silos, feature stores allow sharing features and their metadata with peers. This helps in collaboration and avoids duplication.
  • In critical services such as finance, healthcare, and security, it becomes essential to track the lineage of algorithms being developed. To achieve this, scientists require visibility into the end-to-end flow of the model. A feature store gives a peek into the data lineage of a feature, capturing how a feature was developed, providing insights and reports for regulatory compliance.

Wrapping Up

As mentioned earlier, larger tech companies that extensively deal with AI have built their own feature stores. The industry needs to standardise and automate the core of feature engineering. Moreover, feature stores are slated to become a prerequisite in the machine learning pipeline.

📣 Want to advertise in AIM? Book here

Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts
Association of Data Scientists
Tailored Generative AI Training for Your Team
Upcoming Large format Conference
Sep 25-27, 2024 | 📍 Bangalore, India
Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Flagship Events

Rising 2024 | DE&I in Tech Summit
April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore
Data Engineering Summit 2024
May 30 and 31, 2024 | 📍 Bangalore, India
MachineCon USA 2024
26 July 2024 | 583 Park Avenue, New York
MachineCon GCC Summit 2024
June 28 2024 | 📍Bangalore, India
Cypher USA 2024
Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA
Cypher India 2024
September 25-27, 2024 | 📍Bangalore, India
discord icon
AI Forum for India
Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.