By: Deborah O'Malley | Last updated September, 2022
If you've been into experimentation long enough, you've likely come across the term MDE -- which stands for Minimum Detectable Effect (MDE).
The MDE sounds big and fancy, but the concept is actually quite simple when you break it down. It's the:
As this GuessTheTest article explains, in order to run a trustworthy experiment -- one that's properly powered experiment, based on an adequate sample -- it's crucial you calculate you calculate the MDE.
But not just calculate it.
Calculate it AHEAD of running the experiment.
The problem is, doing so can feel like a tricky, speculative exercise.
After all, how can you possibly know what effect, or conversion lift you want to detect from the experiment?! If you knew that, you wouldn't need to run the experiment to begin with!
Adding insult to injury, things get even more hazy because the MDE is directly tied into your sample size requirements.
The larger the MDE, the smaller the sample size needed to run your experiment. And vice versa. The smaller the MDE, the bigger the sample required for your experiment to be adequately powered.
But if your sample size requirements are tied into your MDE, and you don't know your MDE, how can you possibly know the required sample size either?
The answer is: you calculate them. Both. At the same time.
There are lots of head spinning ways to do so. This article outlines a few.
But, if you're not mathematically inclined, here's the good news. . .
Now, as said, that's the good news!
The bad news is, even a calculator like this one isn't all that intuitive.
So, to help you out, this article breaks down exactly what you need to input into an MDE calculator, with step-by-step directions and screenshots so you'll be completely clear and feel fully confident every step of the way.
Let's dig in:
To work this calculator, you’ll need to know your average weekly traffic and conversion numbers.
In Google’s current Universal Analytics, traffic data can be obtained by going to the Audience/Overview tab:
It’s, typically, best to take a snapshot of at least 3 months to get a broader, or bigger picture view of your audience over time.
For this example, let’s set our time frame from June 1 - Aug. 31.
Now, you can decide to look at these numbers three ways:
Given these differences, calculating the total number of users will probably give you the most accurate indication of your traffic trends.
With these data points in mind, over the 3-month period, this site saw 67,678 users. There are, typically, about 13 weeks in 3 months, so to calculate users per week you’d divide 67,678/13=5,206.
In other words, the site received about 5,206 users/week.
You’d then plug this number into the calculator.
Assuming you’ve set-up conversion goals, you’ll next assess the number of conversions by going to the Goals/Overview tab, selecting the conversion goal you want to measure for your test, and seeing the number of conversions:
In this example, there were 287 conversions over the 3-month time period which amounts to an average of 287/13=22 conversions/week.
You’d now plug the traffic, conversion, and variant numbers into the calculator:
Now you can calculate your baseline conversion rate, which is the rate at which your current (control) version is converting at.
This calculator will automatically calculate your baseline conversion rate for you, based on the numbers above.
However, if you want to confirm the calculations, simpley divide the number of goals completed by the traffic which, in this case, is 22 conversions per week/5,206 visitors per week (22/5,206=0.0042). To get a percentage, times this amount by 100 (0.0042*100=0.42%).
You’d end up with a baseline conversion rate of 0.42%:
As a general A/B testing best practice, you want a confidence level of +95% and statistical power of +80%:
Based on these numbers, the pre-test sample size calculator is indicating to you that you’ll want to run your test for:
As a very basic rule of thumb, some optimization experts, like Ronny Kohavi, suggests setting the relative MDE up to a maximum of 5%.
If the experiment isn't powered enough to detect a 5% effect, the test results can't considered trustworthy.
However, it's also dangerous to go much beyond 5% because, at least in Ronny's experience, most trustworthy tests don't yield more than a 5% relative conversion lift.
As such, for a mature testing organization which large amounts of traffic and an aggressive optimization program, a relative 1-2% MDE is more reasonable and is still reason to celebrate.
In the example shown above, the relative MDE was 46.43%, which is clearly above the 5% best practice.
This MDE indicates traffic is on the lower side and your experiment may not be adequately powered to detect a meaningful effect in a reasonable timeframe.
In this case, if you do decide to proceed with running the test, make sure to follow these guidelines:
Hope this article has been useful for you. Share your thoughts and comments below: