By: Deborah O'Malley | Last updated September, 2022
If you've been into experimentation long enough, you've likely come across the term MDE -- which stands for Minimum Detectable Effect (MDE).
The MDE sounds big and fancy, but the concept is actually quite simple when you break it down. It's the:
As this GuessTheTest article explains, in order to run a trustworthy experiment -- one that's properly powered experiment, based on an adequate sample -- it's crucial you calculate you calculate the MDE.
But not just calculate it.
Calculate it AHEAD of running the experiment.
The problem is, doing so can feel like a tricky, speculative exercise.
After all, how can you possibly know what effect, or conversion lift you want to detect from the experiment?! If you knew that, you wouldn't need to run the experiment to begin with!
Adding insult to injury, things get even more hazy because the MDE is directly tied into your sample size requirements.
The larger the MDE, the smaller the sample size needed to run your experiment. And vice versa. The smaller the MDE, the bigger the sample required for your experiment to be adequately powered.
But if your sample size requirements are tied into your MDE, and you don't know your MDE, how can you possibly know the required sample size either?
The answer is: you calculate them. Both. At the same time.
There are lots of head spinning ways to do so. This article outlines a few.
But, if you're not mathematically inclined, here's the good news. . .
You can use a pre-test analysis calculator, like this one, to do all the hard work for you:
Now, as said, that's the good news!
The bad news is, even a calculator like this one isn't all that intuitive.
So, to help you out, this article breaks down exactly what you need to input into an MDE calculator, with step-by-step directions and screenshots so you'll be completely clear and feel fully confident every step of the way.
Let's dig in:
To work this calculator, you’ll need to know your average weekly traffic and conversion numbers.
If you’re using an analytics platform, like Google Analytics, you’ll be able to easily find this data by looking at your traffic and conversion trends.
In Google’s current Universal Analytics, traffic data can be obtained by going to the Audience/Overview tab:
It’s, typically, best to take a snapshot of at least 3 months to get a broader, or bigger picture view of your audience over time.
For this example, let’s set our time frame from June 1 - Aug. 31.
Now, you can decide to look at these numbers three ways:
Given these differences, calculating the total number of users will probably give you the most accurate indication of your traffic trends.
With these data points in mind, over the 3-month period, this site saw 67,678 users. There are, typically, about 13 weeks in 3 months, so to calculate users per week you’d divide 67,678/13=5,206.
In other words, the site received about 5,206 users/week.
You’d then plug this number into the calculator.
To calculate the number of conversions over this time period, you’ll need to have already set-up conversion goals in Google Analytics. Here’s more information on how to do so.
Assuming you’ve set-up conversion goals, you’ll next assess the number of conversions by going to the Goals/Overview tab, selecting the conversion goal you want to measure for your test, and seeing the number of conversions:
In this example, there were 287 conversions over the 3-month time period which amounts to an average of 287/13=22 conversions/week.
Now, imagine you want to test two variants: version A (the control, or original version) and B (the variant).
You’d now plug the traffic, conversion, and variant numbers into the calculator:
Now you can calculate your baseline conversion rate, which is the rate at which your current (control) version is converting at.
This calculator will automatically calculate your baseline conversion rate for you, based on the numbers above.
However, if you want to confirm the calculations, simpley divide the number of goals completed by the traffic which, in this case, is 22 conversions per week/5,206 visitors per week (22/5,206=0.0042). To get a percentage, times this amount by 100 (0.0042*100=0.42%).
You’d end up with a baseline conversion rate of 0.42%:
Next, plug in the confidence level and power at which you want to obtain results.
As a general A/B testing best practice, you want a confidence level of +95% and statistical power of +80%:
Based on these numbers, the pre-test sample size calculator is indicating to you that you’ll want to run your test for:
As a very basic rule of thumb, some optimization experts, like Ronny Kohavi, suggests setting the relative MDE up to a maximum of 5%.
If the experiment isn't powered enough to detect a 5% effect, the test results can't considered trustworthy.
However, it's also dangerous to go much beyond 5% because, at least in Ronny's experience, most trustworthy tests don't yield more than a 5% relative conversion lift.
As such, for a mature testing organization which large amounts of traffic and an aggressive optimization program, a relative 1-2% MDE is more reasonable and is still reason to celebrate.
In the example shown above, the relative MDE was 46.43%, which is clearly above the 5% best practice.
This MDE indicates traffic is on the lower side and your experiment may not be adequately powered to detect a meaningful effect in a reasonable timeframe.
In this case, if you do decide to proceed with running the test, make sure to follow these guidelines:
Hope this article has been useful for you. Share your thoughts and comments below:
A useful article on explaining what's MDE, loved reading it! 🙂 What's not clear to me is that is a 5% MDE a relative change (e. g. from 10% to 10.5%)or a 5% percentage point increase (e. g. from 10% to 15%)?
Tamas - Great question. The answer will depend on whether you're looking to express the lift in absolute (10% to 10.5%) or relative (percentage point increase from 10% to 15%) terms. However, using the calculator shown in the example, the answer is the MDE should be a RELATIVE (10% to 10.5%) MDE.
Thanks for clarifying it! 🙂
When Ronny stated the below, which one was he referring to?
"However, it's also dangerous to go much beyond 5% because, at least in Ronny's experience, most trustworthy tests don't yield more than a 5% conversion lift."
Tamas - confirmed with Ronny. He's referring to RELATIVE. You've brought up a great point, and will update the article text so there's no confusion in the future. Thanks!
The article is really good! Congratulations! I have a question about sample size calculation for continuous metric. Do you have any reference on this please?
Thanks Bruna! What, specifically, are you looking for with sample size calculations for continuous metrics?
For example, my experiment consists of optimizing billing. But when using this calculator (https://www.evanmiller.org/ab-testing/sample-size.html), which is great, the input parameters don't make sense anymore, because the "current effect" is, for example $100, and the "expected effect" is $200.
Bruna - If I'm understanding correctly, $100 would be considered the baseline. I'll assume $100 is Revenue Per Visitor (RPV). To get the conversion rate, you'd need divide $100 by the number of visitors paying that amount. Let's say 10,000 people. So your baseline RPV conversion rate would be 1% (100/10,000=0.01, O.01*100=1%). The relative percentage difference from $100 to $200 is the effect you're hoping to detect. That's a 100% gain, so the relative MDE would = 100. In Evan Miller's calculator, you'd plug those numbers in, and it would show you'd need a sample size of 1,767 people: https://www.evanmiller.org/ab-testing/sample-size.html#!1;80;5;100;1.… Read more »
That's right, perfect! Thank you so much!
I thought I should use another formula for calculating the sample size, like as "sample size for a two-sample t-test", so I asked about sample size for continuous metrics.
Glad the info was helpful for you!