If you’re building out your ML stack, you’ve probably considered implementing a feature store. Used by companies like Uber, Netflix, Airbnb and Google, a feature store is often presented as a necessary part of production ML.
But, this article asks whether you really need a feature store - and their answer is ‘no’. At least, not unless there are some specific circumstances that justify the additional complexity that using one will create.
This article sets out what those circumstances are, and the alternative approaches that could serve your needs just as well.
Production ML Papers to Know
Welcome to Production ML Papers to Know, a series from Gantry highlighting papers we think have been important to the evolving practice of production ML.
We have covered a few papers already in our newsletter, Continual Learnings, and on Twitter. Due to the positive reception we decided to turn these into blog posts.
Why you probably don’t need a feature store?
What is a Feature Store?
The first mention of feature stores was in this Uber blog describing their ML platform, Michelangelo.
A feature store is, simply, a repository for storing and serving ML features. It is a key-value store, where a timestamp and a key - an entity_id such as a user - is provided by a client, and feature values are passed to the model for either training or prediction. The features can be ingested from various data sources, and transformed as necessary prior to ingestion.
The problem a feature store attempts to solve is something called training-serving skew. A model that is trained on processed data, needs to make predictions on production data that has been processed in an identical way - if it isn’t, we can’t be confident our model predictions will be as good.
Uber’s basis for a feature store was that they had an offline training process and an online prediction process - so an internal feature store enabled both processes to be in sync.
But feature stores might add unnecessary complexity, and they are not the only way of addressing training-serving skew.
What are the alternatives to a Feature Store?
Let’s take a look at the alternate approaches outlined in the article.
The first, and simplest, alternative is to incorporate the preprocessing steps within the model function. Both training data and prediction data are passed to the model in a raw state; this data is processed and a prediction returned by the modeling function.
This approach is simple and versatile. Because preprocessing code is part of the model function, no extra infrastructure is required. The model can be deployed on the edge or in the cloud relatively easily.
But preprocessing steps will be repeated each time data is sent to the model, which can be computationally expensive. In addition, we reduce flexibility by having to implement the preprocessing code in the same framework as the ML model.
The second approach is to use a transform function to preprocess data prior to passing the data to the model for training or for making a prediction.
This approach requires an additional step to be inserted between the input and the model, and for this to be invoked for both the training and prediction code.
This step might be encapsulated within a container or an SQL clause, and while this adds efficiency, it can also add complexity, so this approach should only be used if the extra infrastructural and bookkeeping overhead is worth it.
So when should we use a Feature Store?
The article contends that these two approaches should be sufficient for most features - but also that there are times when a feature store might be invaluable.
In particular, we will need a feature store if the feature value is not known by the client (for example, a mobile app), has to be computed on the server side, and injected into prediction requests. An example is the number of visitors to a hotel, which could be a feature of a dynamic pricing model, and which will vary over time.
We might also need a feature store to prevent unnecessary copies of the data, such as when a feature is computationally expensive and used by multiple ML models. It might be more efficient and maintainable to store it centrally.
The diagram below illustrates an example provided in the article, where a feature used by many models - in this case, the output of an embedding algorithm - is updated daily. The models are re-trained regularly, and the feature store ensures that the embedding feature is provided efficiently, and is aligned with the training labels and timestamp required by the models.
In summary, a feature store is particularly useful for hard-to-compute features that are not available on the client side, are frequently updated, and used by multiple models.
A lesson I’ve learned again and again in ML is that complexity should be earned, not assumed. ML systems are prone to bugs and long development times. It’s best to take the shortcuts you can to get a minimum viable model into production quickly, and iterate on your approach from there.
Through that lens, this article reminds us that there’s no one-size-fits-all solution to feature serving. Alternate approaches to a feature store will usually better meet our requirements without the additional complexity of a feature store.
The article is available here.