So you built a great ML model. All that’s left is to do is deploy it into your product, right?

Not so fast. A model is just one part of a product experience designed to solve a problem for your user. And ML-powered products suffer from unique failure modes and points of user experience friction. As a result, the best ML-powered products are designed from the ground up with ML in mind.

How can we design a product that is well-suited to being powered by ML? That’s the question the authors of this week’s paper set out to study.

Production ML Papers to Know

This is a continuation of Production ML Papers to Know, a series from Gantry highlighting papers we think have been important to the evolving practice of production ML.

We have covered a few papers already in our newsletter, Continual Learnings, and on Twitter. Due to the positive reception we decided to turn these into blog posts.

Do ML-powered products need to be designed differently than all other products?

Guidelines for Human-AI Interaction

This paper proposes 18 design guidelines for ML-powered products.

Before we dive into them, let’s describe the methodology for those (like me) that don’t read a lot of human-computer interaction (HCI) papers.

This paper is a user study. Here, that means that the paper’s authors designed the 18 guidelines by sourcing ideas from a review of AI products and relevant papers, and evaluated them by asking 49 HCI practitioners to provide structured feedback on the guidelines’ applicability to 20 different products.

This methodology is well-suited to a paper because it’s objective, but they didn’t explore how useful their guidelines are for designing new products. You’ll probably pick up on this in the guidelines themselves, which can feel a bit generic. Check out Google’s People + AI Guidebook for a set of guidelines that feels a bit more “practitioner-tested”.

The guidelines are categorized by when in the product lifecycle they are likely to be useful.

Guidelines that are useful initially

When users hear a system is “AI-powered”, they often have unrealistic or inflated expectations for what it can do. The “useful initially” guidelines are designed to mitigate that risk by helping users understand what the system is actually capable of.

Guidelines that are useful during interaction

The authors recommend designing the product interaction itself to be sensitive to the context — both user-level context and societal context.

Honestly, these recommendations aren’t very AI-specific or actionable (with the possible exception of “mitigating social biases”). Maybe that’s why study participants found them “very clear” less frequently than other guidelines.

Guidelines that are useful when the model is wrong

Ok, back to actually useful guidelines.

Machine learning models get things wrong all the time. That’s one of the biggest differences between ML-powered products and traditional software systems. It’s often one of the things users (and other stakeholders) have the hardest time understanding.

To mitigate this source of friction, the authors propose guidelines that avoid the “system did something wrong and I have no idea why or how to fix it” problem.

Guidelines that are useful over time

Good ML-powered products are not static, they’re updated over time to incorporate new context about the world and user preferences.

These guidelines are about making those updates a good experience for users. Some are common sense (G12, G13), but others are not as widely applied in ML products as they should be.

In particular, the study recommends keeping users informed about how (G16) and when (G18) their actions are used to update the model. In my experience, few ML teams help users understand why it’s worth their time to give feedback, which makes it unlikely they will do so.

The upshot

There are still too few resources out there about how to design ML-powered products, and this one is worth checking out.

Practically speaking, there are a few gems here (G1, G2, G8, G9, G11, G12, G15, G16, G18), and there are a few that feel a bit academic as well. I suppose that’s one drawback of this style of HCI research.

Check out the full paper: Guidelines for Human-AI Interaction