15 Assumption Testing Methods for Product & User Research Teams

And how to define success and failure for your assumption tests / experiments.

Oct 10, 2023

The best product teams run 10-20 experiments every week. If that sounds absurd or unachievable, then you’re thinking about assumption testing all wrong…

The earlier you are in a project, the lower fidelity your experimentation method should be. Lower fidelity experiments are faster, cheaper, and lower risk (on both your personal credibility and team’s budget).

As your certainty and/or conviction in a hypothesis increases, experiments should move from ‘asking’ methods to ‘building’ methods:

This is part 2 in my series on Assumption Testing → check out part 1 for the basics of assumption testing and four PM frameworks that help you identify your assumptions.

Qualitative Research

1. User Interviews

The lowest-fidelity method of all is simply to ask customers questions that test your assumptions. Recommended resources → The Mom Test for Founders, Deploy Empathy for Product Managers.

2. User Feedback

Customer support, success and account management teams sit on a treasure trove of complaints, suggestions and feedback from customers. Some quick keyword searches can show you the right direction to start digging.

3. Ethnography

Ethnography means spending time with or observing people to learn about their lives, feelings, and habits. Observe customers attempting to complete a routine workflow or task using their existing solution (or your own product) — the more natural the situation, the better.

Quantitative Research

4. Scenario Testing

Scenario testing is a survey method that presents people with hypothetical choices and measures their preferences based which the decisions they make. Unlike rating-scale questions (eg. 1-5 stars), Scenario Tests force people to compare options directly, giving you better insight into how they might act in a real situation. Common formats include:

Pairwise Comparison — breaks a list of options into a series of head-to-head “pair votes”, measures how often each option is selected, and ranks the list based on relative importance.
Ranked Choice Voting — presents the full list of options to be ranked in order of preference.
Points Allocation — gives people a pool of credits to allocate amongst the available options however they see fit so as to measure the magnitude **of their preferences.

OpinionX is a free research tool for creating scenario-based surveys for assumption testing using any of the formats mentioned above. It’s free, comes with analysis features like comparing results by customer segment, and is used by thousands of product teams at companies like Google, Shopify, LinkedIn and more.

5. Paraphrase Testing

Paraphrase testing helps you figure out if people are interpreting a statement as you expected. No fancy equipment required — just show the statement alongside an open-response text box and ask people to write what it means to them. You can either count keywords or thematically analyze the answers.

6. Quant Survey

I tend to opt for Scenario Testing if I’m going to be running a survey-based assumption test, however, if your assumption relates to an objective fact (like how much time someone spends on a given task per week or the financial cost a problem represents to their team) then a traditional quantitative survey can be great. Here’s a guide explaining best practice and common pitfalls when using quant surveys.

Demand Testing

7. Ad Testing

Online ads be used to source participants for your assumption test or quickly A/B test concepts without needing your own website traffic.

DoorDash — the founders used AdWords to test whether anyone was searching for food delivery in Palo Alto. You can use Instagram ads with variations of the same image/text and see which gets the highest click-through rate.

Unilever — we used Facebook ads with concept product images that directed people to short surveys, telling us click-through rates as well as their qualitative impressions and customer comprehension of the concepts.

8. Dry Wallet

Dry Wallet tests lead customers through a checkout experience to a dead end like an “Out of Stock” message. These tests let customers prove their intention to buy without you having to invest in the product upfront. Dry Wallet tests are particularly common among ecommerce companies.

Groupon — the founders created a blog with fake deals and offers to see if people would sign up for the deals. They didn’t build the product until they proved this demand.
Buffer — the social media scheduling tool created a landing page and set of pricing plans before ever building the actual product. If you clicked through one of the pricing plans, you were told the product wasn’t ready yet and shown an email sign-up form to be notified about the future launch.

9. Fake Door

Fake Door tests are like Dry Wallet tests but focused on a specific feature rather than mock purchasing a new product entirely.

Dropbox — created a landing page video using paper cutouts to explain the original product concept. As founder Drew Houston later said, “[the video] drove hundreds of thousands of people to the website. Our beta waiting list went from 5,000 people to 75,000 people literally overnight. It totally blew us away.”

Prototype Testing

10. Impersonator

The Product Impersonator test uses competitor components to deliver the intended product/service without the customer knowing. Product Loops offers two great examples:

Zappos — initially purchased shoes from local retailers as orders came in instead of buying inventory upfront.
Tesla — in 2003 (pre-Elon Musk), Tesla created a prototype fully-electric roadster using a heavily modified, non-functional Lotus Elise to demonstrate to prospective investors and buyers what the final design might look like.

11. Wizard of Oz

A quick way to test an assumption is to offer functionality or services to customers without actually building the process for delivering them. If the customer requests the service or pays upfront for it, you scramble to facilitate the requirements manually behind the scenes. Examples:

DoorDash — the team had no delivery drivers or system in place to process orders when they first launched; they just sent whoever on the founding team was closest to the restaurant to go pick the food up when an order was received.
Anchor — “We hired a couple of college interns, we said to them that people are going to push this magical button and say “I want to distribute my podcast” and your job is to do all that manually but to them it’s going to feel magical like it happened automatically. We just had college students submitting hundreds of thousands of podcasts.” (quote from Anchor’s VP of Product, Maya Prohovnik)
The video/tweet below talks about using ‘Wizard of Oz’ features to close enterprise deals, like a ‘Generate Report’ button that shows “Report will take 48-72 hours to compile” to the user while the report is manually created.

*View the video and original tweet here*

12. Wireframe Testing

The most common wireframe testing methods are usability tests (asking users to complete tasks using interactive prototypes) and five-second test (showing an image or video of the prototype and asking users to explain what they see), a first-click test, or image-based pair voting (showing two images at a time to measure preferences).

Vanta — the first version of Vanta was just a spreadsheet they shared with new customers, testing whether a templated approach to SOC 2 applications could be done. As Vanta’s founder Christina Cacioppo explains, “We started with really open-ended questions, then moved to spreadsheet prototypes, and then moved to prototypes generated with code. At the end of the six months, we started coding.”

Product Analytics

13. A/B Testing

A/B testing is a common type of variable test that splits your users into two groups who are shown two different versions of the product and tracks the differences in their usage. A/B tests are most commonly used for optimization work (finding ways to improve existing functionality) but some companies (like Spotify) often use it to test the impact of new functionality too.

A/B testing requires large sample sizes to be statistically significant, which limits its use mostly to large companies:

“Unless you have at least tens of thousands of users, the [A/B test] statistics just don't work out for most of the metrics that you're interested in. A retail site trying to detect changes that are at least 5% beneficial, you need something like 200,000 users.” — Ronny Kohavi (VP at Airbnb, Microsoft, Amazon) on Lenny’s Podcast

A/B testing uses the same principle as pairwise comparison (covered under ‘Scenario Testing’), except in pairwise comparison surveys, respondents are shown statements/images instead of the actual product experience.

14. Feature Adoption Rate

Most product analytics tools can track feature adoption rate. With these tools, you can ship a V1 feature and measure how many users try it as a gauge for overall interest. This is similar to how we launched our first needs-based segmentation feature for OpinionX — initially, we shipped a really simple segmentation filter that only worked on the answers to one individual question. Customers quickly asked us to expand the feature to segment more types of data simultaneously and to compare and correlate different customer segments.

15. “Ship It And See”

This is not an assumption test — it’s an “everything” test. Be prepared for a hard fall if you’ve jumped straight to shipping your idea without any experimentation, research, or assumption testing along the way…

Defining Success & Failure Upfront

Defining conditions for success before conducting an assumption test reduces confirmation bias and (even more importantly) avoids teams disagreeing over the meaning of the results.

i. Clear Purpose

The only way to define success for an assumption test is to first understand why you’re conducting this test in the first place.

The Opportunity Solution Tree excels here because its tree-based structure links everything together, and the “Four Big Risks” framework helps you articulate the specific aspect that’s riskiest. For example, if your assumption test is for a new feature, then you know what customer need/pain that solution must address (OST) and which aspect of the solution you’re most uncertain of (4BR).

Example 1: Vanta

Here’s a great example of assumption testing in action from Vanta’s founder Christina Cacioppo, taken from a recent profile by First Round Review:

In their first experiment, they went to Segment, a customer data platform, and interviewed its team to determine what the company’s SOC 2 should look like and how far away it was from getting it. “We made them a gap assessment in a spreadsheet that was very custom to them and they could plan a roadmap against it if they wanted,” she says.
Cacioppo was running a test to answer two key questions:
Could her team deliver something that was credible?
Would Segment think that it was credible? The answer to both questions ended up being “yes.” And thus, the first (low-tech) version of Vanta was born — as a spreadsheet. “It actually went quite well, so we moved on to a second company, a customer operations platform called Front,” she says.
For this experiment, Cacioppo wanted to test a new hypothesis: Could she give Front Segment’s gap assessment but not tell them it was Segment’s? Would they notice?
“We used the same controls, the same rules and best practices, and still interviewed the Front team to see where Front was in their SOC 2 journey, so it was customized in that sense,” she says. “But this test was pushing on the 'Can we productize it? Can we standardize this set of things?' And most importantly, ‘Can they tell this spreadsheet was initially made for another company?’”
They couldn’t. And then, Cacioppo got an email that sealed the deal in terms of validating her idea.*

I had to block quote this whole section because it was already perfect. Christina outlines an initial assumption (Can Vanta credibly provide SOC 2 compliance as a service), tests that, updates her riskiest assumption (the application process for SOC 2 compliance could be standardized), and tests that again. These assumptions are so clearly articulated that the outcome was an obvious success/failure.

Example 2: OpinionX

6 months after launching OpinionX, we lost our one and only paying customer. Having conducted 150+ interviews up to that point, we were sure that our key problem statement — “it’s hard to discover users’ unmet needs” — was the right pain point to focus on. But losing our only customer made us question whether we had really tested this assumption…

We decided to run a quick test, where we compiled a list of 45 problem statements and shared them with 600 target customers (of whom 150 people agreed to help). Each person was shown 10 pairs of problem statements and asked to pick the one that was a bigger pain for them. Using that data, we ranked the statements from most important to least important problem. And guess what? Our key problem statement was DEAD LAST.

We had spent over a year building a product that solved the wrong problem all because we had never tested whether the problem was a high priority for customers to solve. When we interviewed participants from this ranking test, we learned that they didn’t actually struggle to discover unmet customer needs — on the contrary, many teams felt like were drowning in unmet needs and couldn’t figure out which ones to prioritize! We pivoted the product away from problem discovery to problem ranking. One week later, we had 4 paying customers.

ii. Comparing Segments

Teams often fail to account for how customer segments influence the results of their assumption tests. Here’s a 30-second example I made that shows how an assumption test can look like a failure at an aggregate level while also showing the opposite result when you isolate a specific customer segment:

To account for segment differences, it’s important to ask whether you’ve baked in any assumptions about customer segments into the test you’re designing. If so, you need a way to filter and/or compare results by customer segments. If that’s not possible, you should find ways to isolate segments (eg. creating identical but separate tests for each key segment) or test your segment hypothesis first.

One final tip (taken from my guide to needs-based segmentation) is the difference between top-down and bottom-up segmentation.

If we’re creating an assumption test to measure feature importance, we can assume that our results will vary significantly depending on the pricing plan each customer is on. In top-down segmentation, you define your key segments upfront.

Imagine you’re running a research project where customers rank problem statements to help you understand which is their highest priority to solve. There are a lot of data points that could impact the results — from company size and industry to seniority and job title. In cases like these, we want to collect enough data across different categories (or enrich our results with existing customer data) so that we can identify the segments that impact the results most afterward. This is bottom-up segmentation, where you let the data tell you which segments that impact your results most. More info on bottom-up segmentation here.

Conclusion

No matter how strong our product sense is or how close we stay to our customers, many of our assumptions will prove to be incorrect. Learning how to identify the riskiest assumption in our ideas, rapidly and iteratively test those assumptions, and update our view of the customer or market, is a fundamental skill for product teams. I hope this guide has given you clear next steps to help you work on your assumption testing skills!

OpinionX is a free research tool for ranking people’s priorities. Thousands of teams from companies like Google, Shopify and LinkedIn use OpinionX to test their assumptions about their customers’ preferences using choice-based ranking surveys so they can inform their roadmap, strategy and prioritization decisions with better data. Create your own assumption test on OpinionX today for free.

Full Stack Researcher

Discussion about this post