When you and I look at a software application, we see more-or-less the same thing. Maybe the order of the items presented to you in a recommendation carousel is different, but the structure of the page is the same.

As machine learning-powered products go mainstream, the era of unpersonalized software may soon feel outdated. Every application each of us interact with could someday behave differently depending on our unique goals and preferences.

Today, we’re covering a blog post from Doordash that gives us a glimpse into some of the challenges we’ll face as we move to a more personalized world.

Production ML Papers to Know

This is a continuation of Production ML Papers to Know, a series from Gantry highlighting papers we think have been important to the evolving practice of production ML.

We have covered a few papers already in our newsletter, Continual Learnings, and on Twitter. Due to the positive reception we decided to turn these into blog posts.

How personalized can products get with deep learning?

Improving homepage recommendations at Doordash

DoorDash’s homepage has limited real estate, but a massive number of candidate restaurants and other offers to consider displaying. A perfect candidate for personalization!

Traditional recommender systems rank-order a list of similar items based on what the user is likely to engage with. For example, you might order a list of movies, or, in DoorDash’s case, a list of restaurants.

However, restaurants are not the only content displayed on Doordash’s homepage: they also have carousels like “Most Ordered Dishes Near You” or “Now on DoorDash”. In the previous iteration of the homepage, the ordering of the carousels was fixed, and only the individual restaurants were personalized. This post explains how DoorDash moved from this mixed-personalization setup to a fully personalized page.

Challenges with full personalization

Personalizing the full content of the homepage is challenging for two reasons:

  • Ranking content across types. For example, the items being displayed on the feed could be restaurants, dishes, or carousels of restaurants and dishes. How can we have one ranking model that works across all types?
  • Maintaining diversity of recommendations. When part of the home page is hand-curated, you can manually ensure users will always be exposed to new restaurants. With a fully personalized page, the algorithm needs to make sure the “exploit” action of showing known favorites is balanced with the “explore” action of suggesting new options.

Ranking content across types by building a shared embedding space

DoorDash’s approach to ranking content across types works a bit like a language model. In NLP, training a model to directly predict a whole sentence is hard because of the combinatorial explosion of sentences created by even a small number of words. So in language modeling, we instead model the sentence as a sequence of a smaller number of characters (or tokens).

Similarly, DoorDash breaks the UI elements they are ranking into sequences of primitives. For example, a store is just a sequence of items. If item 1 has features f_1, then we might represent its feature vector with a padded sequence [f_1, padding, padding]. Similarly a store might be [f_1, f_2, f_3] to incorporate the features from each item.

These feature vectors are passed through a sequence model like an LSTM to map them to a shared embedding space, where they can be freely compared.

Maintaining diversity of recommendations through an exploration bonus

The DoorDash team designed a simple scheme to encourage the model to recommend new items. Their approach was inspired by algorithms like UCB that manage the explore-exploit tradeoff by incorporating the uncertainty about the action into the optimization process.

Instead of showing the items that have the highest score according to the ranking model, instead they adjust the model’s scores as follows:

Score_Comp^{c, e} is the adjusted score for user c interacting with entity e, while Score_UR^{c, e} is the raw score produced by the model. The square root term is a proxy for uncertainty; its job is to adjust the raw score to encourage exploration. Inside the square root, the numerator slowly increases the score for all entities as the log of N_c (the total impressions for user c across all entities) increases. The denominator rapidly reduces the adjustment to zero as N_{c,e} (the total number of times user c has seen this specific entity) increases.

The result is that items that have a middling score according to the ranking model will be shown to the user eventually.

The upshot

Posts like this give us a glimpse into some challenges many companies may face as they build more personalized products.

Two takeaways that apply more broadly than just homepage recommendations:

  • Sequence modeling isn’t just for language models. Many problems admit a sequence structure, and finding ways to take advantage of that structure is underrated
  • Reinforcement learning and online adaptation don’t need to be complicated. As a first pass, you can apply ideas from these fields to adapt to the online context without retraining. Just use a simple, deterministic score adjustment

Check out the paper if you want to learn more about DoorDash’s approach: https://doordash.engineering/2022/10/05/homepage-recommendation-with-exploitation-and-exploration/