Bayesian Statistics isn’t new. It’s been around for a while. Since the 1770’s, in fact.
But, its just now making inroads as computing capabilities improve and marketers become more savvy about proper A/B testing techniques. Case in point, A/B testing software company, Visual Website Optimizer (VWO), recently started using Bayesian Statistics in their new SmartStats A/B testing engine.
Bayesian statistics is a statistical model that can be applied to A/B testing to more accurately determine the precise range of your conversion rate, answering the question: “What’s the probability of version B beating A?”
Named after the British mathematician, Thomas Bayes, the model is based on “Bayes Theorem,” also called Bayes’ Rule. The theorem is expressed with this formula:
Breaking down the formula:
Is your head spinning yet? Stick with it, it’ll all makes sense in a minute. . .
First though, you need to accept the mindset that, in Bayesian reasoning, you can continually update your beliefs about data, as your gather evidence. In A/B testing, you gather evidence by running a test. After observing the evidence, your opinions may change. This is the idea behind probability.
Probability is an opinion, based on inference. As you gather new relevant information, you revise your opinion.
Conditional probability is the likelihood of an event or outcome, based on the occurrence of a previous event or outcome; the probability is contingent on previous results.
It’s easiest to understand the idea of conditional probability with a simple example.
Imagine a bag filled with three colored blocks (yellow, purple, and white). Initially, each block has an equal chance of being picked out of the bag. The conditional probability of drawing the yellow block is 33% (1/3). With two blocks left, there’s a 50% (1/2) chance of drawing the purple or white block. The conditional probability of drawing the purple block, after having drawn the yellow block, is 16.5% (33%*50%).
Prior probability is the original probability of an event, before new evidence or information is obtained. New information updates the probability, giving a more accurate measure of the outcome.
You can better understand this concept by, again, looking at our colored block example.
We know there’s three blocks; one of them is white. If all three blocks are in the bag, the conditional probability of drawing the white one is 33% (1/3). Now, imagine a block is pulled out. It’s not the white one, but we don’t know what color it is. Since there are only two blocks left, the prior probability of 33% is now updated to 50% (1/2). With new evidence, we now know, we’re equally likely to pull the white or other colored block. The new evidence updates the probability.
Now, let’s take these examples and bring them back to A/B testing.
In Bayesian Statistics, the probability of your hypothesis being correct is based on evolving data and informed by what’s happened up to that point. Like in the block examples, your opinion can evolve with your data. The more evidence you gather throughout your experiment, the better, and more accurately, you can confirm or deny expectations. Talk about a forgiving and flexible model.
There’s not this kind of flexibility in traditional testing, rooted in Frequentist Statistics. Things are much more black and white. You can only reject the null hypothesis. So, you can only say, for example, variant A did not equal variant B, as calculated by a 0.02 p-value (significance level).
But, in Bayesian statistics, there’s no significance level because you’re not confirming or rejecting the null hypothesis. Instead, you’re using probability to calculate future outcomes.
With this approach, it’s possible to determine the precise range of your conversion rate, answering an essential question: “What’s the probability the new version will outperform the old?” Bayesian statistics enables you to confidently say, for example, “there’s an 95% chance variant B has a 7% uplift over variant A.”
In Frequentist Statistics, there’s no way to answer this question. You can only say the two variants do not perform equally. If your results are inconclusive, tough. You have no idea why, or by how much. You have to re-test and hope for the best.
What’s also great about Bayesian Statistics is you can accurately run A/B tests, irrespective of size sample because your observations are always evolving. So, you don’t have to worry about calculating sample size in advance. And, you don’t have spend weeks or months running a test to get valid results, if traffic is low on your site.
Bayesian statistics also mitigates some methodology errors made during A/B testing, so, even if you peek at your test early, results remain still statistically sound. And, you don’t have to worry about falsely declaring a winner before the test is done because you’re not relying on statistical significance.
First off, know that using Frequentist statistics isn’t wrong or bad. It’s just not entirely the right approach with A/B testing.
As Chris Stucchio, VWO’s former Director of Data Sciences remarked:
Frequentist statistics provides probabilities about hypothetical experiments, which is unintuitive when trying to get real-life results. But, the approach is popular because computing with the frequentist method is easy. Frequentist methods can be computed in microseconds, while Bayesian methods can take many minutes. And, ten years ago, it probably wouldn’t have even been possible to compute with Bayesian statistics, at least at the scale that’s being done today.
The Bayesian approach, however, delivers more concrete, easily understandable results, helping you call a winning variant with greater clarity and confidence, in less time.
Some other benefits of the Bayesian model are it:
But, some drawbacks are it:
In the near future, it’s likely the Bayesian model will come into the mainstream and more tests will be run using Bayesian statistics.
What do you think? Do you use a Bayesian approach? What do you think of it? What are some of the benefits and challenges?
Let us know in the comments section below.
Join the Best in Test awards ceremony. Submit your best tests and see who wins the testing awards.
A primer explaining the 4 different types of tests you can run, what they mean, and how you can use each to improve your competitive testing advantage.
One of the most debated testing topics is how large does my sample size need to be to get trustworthy test results? Some argue samples of more than 120,000 visitors per variant are needed to begin to see trustworthy test results. Ishan Goel of VWO disagrees. What does he think is needed to get trustworthy test results? Listen to this webinar recording to find out.