When everything feels important: How MaxDiff and Conjoint sharpen product decisions

Recently, I went to a running hot-pot restaurant with a friend. Ingredients kept passing us on small tracks, and almost all of them looked tempting. The real challenge was not finding appealing options. It was deciding what actually deserved our appetite.

Product development often runs into the same problem. Teams say they want to be data-informed. In practice, that usually means working with a mix of analytics, stakeholder input, support tickets, business pressure, and user feedback. All of that can help. The difficulty starts when those inputs have to become a decision about what should happen next.

This is often the moment when teams reach for a standard survey. On the surface, that seems sensible. Ask users to rate a list of features on a Likert scale and you get numbers back. That kind of data can be useful, but it is often too weak for a prioritisation decision.

Where simple surveys reach their limit

A rating survey asks people to judge each feature in isolation. That is a long way from how product decisions are actually made. Several options can honestly sound important when they are viewed one by one, without cost, compromise, or competition. In a spreadsheet, the result looks structured. In a roadmap discussion, it often changes very little. The team still has to decide what deserves priority, even though many options now carry a respectable score.

Prioritisation is a trade-off problem. It lives under constraint. Time is limited, capacity is limited, and product teams rarely get to build everything they would like to build. If the research design does not reflect that pressure, the result will usually be broad approval without enough differentiation.

A simple example makes the difference easier to see. In a food delivery app, several things can sound useful at the same time: clear delivery updates, fast checkout, saved favourites, or transparent fees. None of these is meaningless, and not all of them are “features” in the narrow product sense. Some are closer to experience qualities. MaxDiff becomes more revealing once people have to choose which of these would matter most in deciding whether to use the app. It shows which of those aspects carry real weight when placed in competition.

Simple ratings flatten priorities because features are judged in isolation. MaxDiff introduces trade-offs by forcing choices between competing options.

What MaxDiff changes

Two methods from market research are worth borrowing for product work: MaxDiff, short for maximum difference scaling, and Conjoint analysis.

Think of MaxDiff as the Hunger Games for product features. Every item gets thrown into the arena, and users decide, repeatedly across rotating sets, which ones survive and which get eliminated. Unlike a rating scale, where everything can score well and go home with a participation trophy, MaxDiff forces a verdict.

Each MaxDiff set shows a rotating subset of items. Respondents pick exactly one “most” and one “least” per set, forcing a trade-off rather than allowing equal ratings.

MaxDiff introduces decision pressure. Instead of asking whether a feature is generally liked, it asks respondents to choose which option matters most and which matters least from smaller sets. That shift may seem minor, but it changes the quality of the signal. The method was originally developed in psychophysics and consumer research to address a known problem with rating scales: people use them differently. Some rate generously, some conservatively, which makes comparisons across respondents unreliable. MaxDiff sidesteps that entirely by forcing relative choices rather than absolute scores. 1

MaxDiff is also not limited to direct product features. It can compare benefits, claims, messages, or user-facing experience qualities, as long as the list is framed as a coherent set of stand-alone items rather than a mixture of different abstraction levels. Qualtrics’ guidance is explicit that MaxDiff works best with mutually exclusive items that act as offerings in and of themselves. Mix “fast checkout” with “we care about your privacy” in the same study and the results will be difficult to act on. The items are simply not competing on the same terms.

A feature can sound attractive in isolation and still lose when placed next to something users consider more valuable. MaxDiff reflects both the reality of product work, where features compete for roadmap space, and the reality of user priorities, where not everything that sounds helpful matters equally.

That matters because teams are rarely deciding between a good feature and a bad one. More often, they are deciding between several plausible options. A method that exposes relative value is far more useful than one that simply confirms broad approval.

A few things to know before you run one

I want to be upfront about something the method explainers tend to gloss over: MaxDiff takes actual effort to set up well, and shortcuts tend to show up in the results.

The sweet spot for item counts is roughly 10 to 40 items. Below that, you do not get enough trade-off signal. Above it, the study design becomes unwieldy and respondent fatigue starts distorting the data. Each set typically shows 4 to 6 items at a time, with enough rotation across sets so that every item competes against every other item a meaningful number of times. 2

MaxDiff produces two distinct outputs: (1) utility scores, which show the relative preference strength of each item on a statistical scale that can be positive or negative. And (2) the preference shares, which translate those scores into percentages showing how likely each item is to be chosen as “best.” Both give you a ranked list you can segment by user group, region, or persona. That segmentation is often where the most useful insight sits. A feature that ranks third overall may rank first among your highest-value customers, which is a very different strategic signal than the aggregate number.

For stable results, plan for at least 250-300 respondents, ideally more if you intend to cut the data by segment. Running MaxDiff on 30 survey responses and presenting the output in a prioritisation meeting as definitive is a good way to make a confident-looking bad decision 🥶.

Why user context still comes first

Qualitative research has an important place early in the process. Contextual observation, usability testing, and interviews grounded in real tasks can reveal friction long before it becomes visible in operational data. They help teams see where users hesitate, what they misunderstand, what they expect, and what quietly makes an experience harder than it needs to be.

By the time a user leaves a poor CSAT rating or contacts support, the issue has usually already happened, something I wrote about in more detail in my earlier CSAT article. These signals show where frustration became visible, not what should come first.

These methods do different work, not competing work. Qualitative research builds understanding of the problem space. CSAT and support signals monitor visible pain. MaxDiff becomes useful when the team has enough context to compare options and needs a more disciplined way to decide what matters more.

Different methods help at different moments: qualitative research builds early understanding, MaxDiff and Conjoint support prioritisation, and CSAT or support signals monitor visible friction.

Where Conjoint fits

Conjoint sits close to MaxDiff, but it answers a different question.

MaxDiff is strongest when the team needs a clearer view of comparative importance across a set of individual items. Conjoint becomes more useful when the decision is about combinations. Products are rarely experienced as isolated features. People encounter bundles of choices, capabilities, service levels, pricing structures, and trade-offs. Conjoint allows teams to model that more directly.

This matters when the real question is not which feature matters most, but which combination is worth building. A team may want to understand whether users would accept a slower experience if it includes reliable human handover. Another may want to know which bundle of features justifies a premium offer. In cases like these, isolated ratings stay shallow. Conjoint is better suited because it reveals how attributes perform when packaged together. 3

MaxDiff surfaces the relative importance of individual items. Conjoint reveals which attribute combinations create the most value, including trade-offs between speed, price, and quality.

When research becomes a decision tool

Methods like MaxDiff and Conjoint make trade-offs visible. That matters in environments where opinions are strong, delivery pressure is constant, and the loudest voice can easily outweigh the clearest evidence. They do not solve that on their own, but they shift the discussion, turning vague approval into relative priority.

I have sat in enough prioritisation meetings to know how quickly a well-formatted spreadsheet of Likert averages becomes a proxy for a decision nobody wanted to make out loud. What these methods offer instead is a structure for the argument, one that is harder to override with a strong opinion and a senior job title.

The certainty trap

Many prioritisation discussions drag on because the research input was never designed for prioritisation in the first place. Teams ask methods to answer a harder question than they were built to handle. Then they wonder why the results feel polite, broad, and inconclusive.

Product teams rarely need more options. They need a clearer basis for deciding what comes first. MaxDiff and Conjoint can help create that basis, but they do not remove uncertainty altogether.

They still depend on how well the attributes are defined, how the study is framed, and whether the results are read honestly. Even so, when the challenge is prioritisation, they usually provide a clearer view of trade-offs than a simple rating survey can. In product work, the options keep coming. Capacity does not.

When to use which method

Use MaxDiff when you need to rank a coherent set of comparable items. Use Conjoint when the decision is about combinations rather than individual attributes. The harder work sits upstream in both cases: defining the right items, the right audience, and a task that people can complete without confusion.

And if someone on your team says a standard five-point scale will do the job just as well, ask them what they plan to do when eight features all score a 4.1 👹.

Want to go deeper?

Here are four useful starting points if you want to explore the methods in more detail:

Footnotes

1 Best-worst scaling, the mechanism underlying MaxDiff, was developed by Jordan Louviere and colleagues in the early 1990s to address the well-documented problem of scale-use bias in rating surveys. The definitive academic treatment is: Louviere, J., Flynn, T., & Marley, A. (2015). Best-Worst Scaling: Theory, Methods and Applications. Cambridge University Press.

2 Sample size guidance specific to MaxDiff comes from Sawtooth Software, whose rule of thumb recommends at least 300 respondents in total and at least 200 per subgroup if you plan to cut the data by segment. See: Sawtooth Software, MaxDiff Sample Size Calculation Best Practices and the MaxDiff Technical Paper (2020).

3 Conjoint analysis was introduced into consumer research by Green and Rao in 1971 and has since become one of the most widely used quantitative methods in market research. For the foundational framework: Green, P.E., & Srinivasan, V. (1978). Conjoint Analysis in Consumer Research: Issues and Outlook. Journal of Consumer Research, 5(2), 103–123.

This article was created with Generative AI support.