Artificial Intelligence

Toward Test-Driven Development for LLMs

As LLMs mature, we need structured approaches to develop with them as a team. In this post, we’ll explore a framework for test-driven development of LLM-powered applications. TDD for LLMs can lead not only to faster development time and fewer errors, but also create a virtuous cycle of continuous improvement of the model as more undesirable behaviors are detected and folded into the tests

How to teach an old model new tricks

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Artificial Intelligence

Putting Responsible AI into Practice

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Developer Platform

How to measure language model performance

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Enterprise Software

Why do ML Projects Fail?

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Monolith: The Recommendation System Behind TikTok

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Developer Platform

MLOps at Industrial-Scale: Lessons from Google

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Artificial Intelligence

From prompt magic to prompt engineering?

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Artificial Intelligence

How do people actually operationalize ML in 2022?

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Artificial Intelligence

Do You Really Need a Feature Store?

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Artificial Intelligence

Test-Time Adaptation: update your model using only unlabeled test data

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Artificial Intelligence

Active Surrogate Estimators: How many labels do you really need to approximate model performance?

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Artificial Intelligence

MLDemon: cheaper monitoring of production models

Part of an ongoing series highlighting insights from papers that have contributed to the development of best practices for production ML

Artificial Intelligence

How personalized can products get with deep learning?

When you and I look at a software application, we see more-or-less the same thing. Maybe the order of the items presented to you in a recommendation carousel is different, but the structure of the page is the same.

Better learning by learning about learning

Responsible machine learning is like security (and maybe like product management, too)

Many machine learning practitioners advocate for **the importance of ethical AI.** But in practice, few ML teams put even basic fairness / bias checks and balances in place.

Do ML-powered products need to be designed differently than all other products?

Artificial Intelligence

What can Data-Centric AI Learn from Data and ML Engineering?

Normally, we think of the ML process as model-centric: we iterate on the model until it performs well on a given dataset. Data-centric AI inverts the model-centric process by bringing data into the iteration loop. We improve the quality of the dataset, which in turn translates to a better model.

Can we do better than "drift detection"?

Placing too much importance on "data drift" is one example of the bad advice you'll often hear about model monitoring on the internet. Drift can hurt your models, but it's not guaranteed to. There's no way to know whether a KL=0.17 will have a big impact on performance.

Fixes that Fail: Self-Defeating Improvements in Machine Learning Systems

Machine learning is undergoing a modularity revolution. ML purists are trained to appreciate end-to-end models, like a self-driving car that maps raw sensor inputs directly to motor commands and is trained directly to get from point A to point B, avoid collisions, etc. However, in the real world, increasingly ML systems are composed of several (or even thousands) of models working together.

Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

One frustrating truth about production machine learning is that models, especially large ones, tend to fail in spectacular and unexpected ways. Suppose your model makes a mistake, like answering "who is the president of the US" incorrectly, or suggesting something offensive.

What to do if you have *too much* labeled data

Your favorite AI breakthrough of the last 1-2 years was probably trained on a massive amount of web-scraped data. Before a few years ago, you needed to employ an army of annotators would have to get labels for all it. Now, researchers use techniques like self-supervised learning that give us "labels" for free as we scrape.

Introducing Gantry: The tool to iterate on machine learning-powered products

Training an ML model is easier than ever, but building an ML-powered product isn’t. ML-powered products need their own dedicated tooling stack.

You're probably monitoring your models wrong

You shipped your machine learning model, and it’s starting to interact with real users. Congratulations on not being part of a (possibly made up) statistic about 87% of models never making it into production.

Toward continual learning systems

One of the biggest misconceptions I hear from laypeople about AI is the belief that machine learning models get smarter as they interact with the world.

Get started today