By: Deborah O'Malley, M.Sc | Last updated May, 2021
When you hear the name "multi-armed bandit," it might conjure up a lot of interesting imagery for you.
Perhaps you envision a bank robber armed with many guns, or a Bonnie and Cylde-type figure about to embark on a thrilling armed robbery cross-country car chase.
In fact, the title "multi-armed bandit" references the old-school style slot machines that used to line Las Vegas casino halls.
These slot machines affectionately garnered the name "one-armed bandits" because with the simple pull of a lever, or arm, you could runaway with millions or lose your life savings.
So, what does a slot machine have to do with A/B testing, anyway?
Well, with a little luck, you might just see the connection. If you do, there's a big payout at the end. 😉
When running an A/B testing using the multi-armed bandit methodology, the situation is somewhat akin to selecting the best casino slot machine with highest chances of winning.
If you were to go to the casino to play the slots, you'd want to zero in on the machine with the highest likelihood to win and go straight to it.
You'd probably figure out the best slot machine by watching others' play for a little while and observing the activity around you.
A machine that just gave a big payout wouldn't be as likely to win again, so you'd probably be more tempted to go to a machine that hadn't hit the jackpot in a while.
But there's a definite tension between choosing the best arms that have performed well in the past versus going to new or, seemingly inferior arm, that you can only hope will perform better.
In multi-armed bandit testing, this same analogy can be applied.
However, instead of slot machines, each "arm" is a test variant.
With test variants, sticking with the tried-and-true control version is like staying on the same slot machine that's already paid out many times before. The chances of winning aren't quite as high -- but you never know what will come until you pay to play.
In contrast, each test variant is like a different slot machine that offers an exciting chance of winning.
As an optimizer, your goal is to find the version, or slot machine, with the best payout rate, while also maximizing your winnings.
Much easier said than done, but luckily, there are highly developed mathematical models for managing this multi-armed conundrum.
The Classic A/B Testing Approach
To fully understand the multi-armed bandit approach, you first need to be able to compare it against classical hypothesis-based A/B testing.
A classic A/B tests positions a control against one or more variants.
In this standard A/B test set-up, typically, 50% of the traffic is directed to the control; the other half of traffic to the variant.
The experiment runs with this traffic allocation until one version reaches a statistically significant conversion difference.
A winner is then declared, and if the winning version is implemented, all traffic will be directed to it -- until another experiment begins.
In this standard A/B testing methodology, even if one version far outperforms the other from the get-go, traffic will still be allocated evenly -- with 50% of visitors seeing the underperforming version and the other half directed to the version that is pulling way ahead.
An A/B testing purist will staunchly advocate that only a test which splits traffic equally (usually 50/50) -- and keeps that allocation throughout the duration of the test -- will yield accurate, reliable, statistically significant results.
The Multi-Armed Bandit Approach
However, one could justifiably argue that the notion of equal traffic allocation is a waste of time and visitor views. If one version is clearly pulling ahead, more traffic should be allocated to it sooner.
This is the approach multi-armed bandit testing takes.
In a multi-armed bandit experiment, traffic is initially equally split across the variants; however, as a winner begins to emerge, traffic is re-weighted and diverted to the best performing version.
The philosophy behind this approach is there's no logical reason to keep sending only half the traffic to a version that's garnering 80% of all conversions, for example.
So why not re-allocate the traffic to the winning version mid-test? That way, you give the version that appears to be the winner the best chance to pull ahead more quickly.
Tell this idea to an A/B testing purist and they'll balk at you!
They'll adamantly argue that, anytime traffic is unevenly allocated -- and especially if it's re-allocated mid-test -- it completely throws off the statistical validity and rigour of the test results.
As a result, unequal allocation of traffic will not yield statistically sound results -- and the approach is faulty. It should not be done!
However, there is a large and growing cohort of optimizers who argue the multi-armed bandit method -- in which traffic weighted towards the winning version mid-test -- is perfectly legitimate.
This faction of testers will argue not only for the legitimacy of multi-armed bandit testing, but also sing it's praises that it's a completely valid, efficient, and effective way to run a study.
In fact, it they will argue, this method yields results more more quickly and efficiently, with less traffic -- for the express reason that traffic is shifted, mid-test, to the version that appears to be pulling ahead as the winner.
In fact, according to Google:
"Experiments based on multi-armed bandits are typically much more efficient than 'classical' A/B experiments based on statistical-hypothesis testing. They’re just as statistically valid, and in many circumstances they can produce answers far more quickly."
In a multi-armed bandit test set-up, the conversion rates of the control and variants are continuously monitored.
A complex algorithm is applied to determine how to split the traffic to maximize conversions. The algorithm sends more traffic to best-performing version.
In most multi-arm bandit testing platforms, each variation in any given test is assigned a weight, a creation date, an allocated number of views, and tracks number of conversions.
The number of views, conversions, and creation date are all assessed to determine the weight, or what percentage of visitors will see the version. The weight is adjusted daily based on the previous cumulative results.
When new variants are added, the total traffic percentage changes. To explore the viability of the new variant, the system gives more traffic to the new variant to fairly test it against the other variants that have already been running.
Depending on visitors' actions with the control or variant, if one version starts take the lead, the percentage of traffic will swing, and more traffic will be weighted to it.
If, after a period of time, visitor behavior changes and that version begins to underperform, the traffic percentage will adjust again.
If a certain version appears to be strongly underperforming, that variant can be turned off or stopped without any consequence. The test results will simply re-calculate the weights on the remaining versions running.
Eventually, if a test is left to run long enough, a clear winner should emerge.
While the multi-armed bandit testing methodology is very flexible and allows for variations to be added, eliminated, or re-weighted mid-experiment, just like any study, testers can get into trouble stopping a multi-armed bandit test too early.
Because there's no specific level of confidence or statistical significance to achieve, experimenters may be tempted to declare a winning result may be prematurely.
This outcome occurs when, for example an experiment is stopped when only 85% of traffic is allocated to the winning version.
Although 85% of traffic represents the strong majority of visitors, and gives a very sound indication that this leading version should indeed be declared the ultimate winner, there is still a 15% chance that variant might underperform.
As a result, a true winner should be implemented only when 95% of visitors, or more, have been reallocated to the penultimate winning version.
Using this 95% traffic benchmark, it's then recommended the test run for an additional 2 days to ensure there are no further traffic fluctuations.
By following the 95% guideline, testers can ensure their results are highly valid at what would equate to a 95% level of confidence and statistical validity in classical hypothesis-based A/B testing.
With this divide between A/B testing approaches, you may be caught in between wondering which approach you should use and implement for your own studies.
While an equally allocated A/B test may uphold a strong level of statistical rigour, a multi-armed bandit test is highly likely to yield reasonably accurate results -- valid enough from which you can make sound business decisions.
These results, however, are more of an approximation of what is likely to work rather than absolute evidence of what has proven to work.
Therefore, you're sacrificing some degree of absolute statistical certainty and trading it off for a quicker, more efficient experiment that's likely to yield reasonably accurate test results.
So, if you don't feel any liberty whatsoever to have an iota of uncertainty, don't go with a multi-armed bandit approach; stick with a classical hypothesis-based A/B testing approach instead.
Conversely, if you're okay with a reasonably high level of accuracy, and are willing to accept a small margin of error as a trade-off for faster, more efficient test results, go with a multi-armed banding methodology.
2. There's really not any certainty anyway. . .
Keep in mind, even if you're absolutely staunchly committed to upholding the highest level of statistical rigour, testing itself is an approximation of what you think is likely to continue to occur based on evidence gathered through past data.
While previous data provides a good indication of what has likely happened and will probably continue to occur, you can't put your utmost faith in it.
First off, the numbers aren't always accurate and don't always tell the true story. For example, your Google Analytics data may not be tracking all visitors, or accurately capturing all users in our emerging cookie-less world.
Secondly, the act of testing itself is speculative and trying to forecast future outcomes based on current data is even more problematic.
Case in point, when acclaimed A/B testing expert, David Mannheim, posed the question, "do you believe you can accurately attribute and forecast revenue to A/B testing" a strong 58% of poll respondents proclaimed, "no, A/B testing cannot accurately forecast revenue."
The rationale: as Mannheim explains in his article, experiment results are speculative in the first place, so how can you speculate upon a speculation and obtain completely accuracy?
Testing only provides a reasonable level of certainty no matter how statistically significant and valid the test.
So going with a quicker, more efficient testing method makes sense. After all, testing is put in place to mitigate risk, and you can always conduct follow-up experiments to further mitigate risk and confirm ongoing accuracy of your teased out assumptions.
Running an experiment using the multi-armed bandit methodology seems to be a perfectly legitimate way to more quickly derive reasonably accurate test results.
It should be strongly considered as a viable alternative to a traditional hypothesis-based testing approach.
What do you think?
Is the multi-armed bandit testing approach sound and reliable enough to base important conversion decisions?
Will you consider using the methodology for your future tests? Why or why not?
Share your thoughts in the Comments section below.
By: Deborah O'Malley, M.Sc | Last updated April, 2021
Something you might not know about me is, when I’m not doing CRO and A/B testing, I’m busy in the kitchen making homemade fudge.
People say my fudge is the best they’ve ever had!
I take great pride in creating delicious chocolate treats, and am always on the hunt to find the best-tasting chocolate in existence.
For quite a while now, I’ve been searching for the best-ever chocolate icing.
Pre-Covid, back when my kids had such thing as school and movie nights, I’d bake delicious cupcakes for the school movie bake sale. They were always a hit!
My secret is I’d use Duncan Hines packaged cake mix, but add real, homemade butter cream icing.
Last summer, my 9-year old son got quite into baking and would YouTube exciting, new recipes to try. He made cookies, breads, and even donuts. Some of it tasted, well, edible. . .
In his baking tear, one of the things he discovered was a delicious-tasting chocolate icing.
It’s branded as 1-minute icing – but that’s a complete misnomer since the ready-made product has to first sit in the fridge for 30 minutes!
This so called 1-minute, easy icing is not for the faint of heart. You have to boil water and melt chocolate. But if you’re willing to take on such a challenge, it’s the most delicious icing you’ll ever have!
At least, I think so. . .
To validate and quantify my statement, I subjected my poor family to yet another one of my silly A/B tests.
This time, I stacked the "The World’s Best Icing" up against your standard, classic, highly processed, pre-packaged brand name variety, Duncan Hines.
My kids were delighted to be able to try to two different types of iced cupcakes!
Can you guess which one won?
The World’s Best Icing, of course. Well, mostly. . .
My husband, 7-year old daughter, and me all agreed, the homemade icing was the best we’ve ever had!
But my cheeky 9-year old son asserted he preferred Duncan Hines. Much to my dismay. . .
We all know 3 out of 4 is not a valid sample size.
So my call to action is to try this incredibly delicious icing for yourself.
Included as a special bonus is a paper format recipe of the World’s Best Icing so you don’t have to keep annoyingly stopping the original YouTube video to follow along while you’re making it.
Simply view or print this paper copy and try the icing for yourself. Then let me know if you agree it’s the world’s best chocolate icing -- or if you've discovered an ever better recipe.
As with every other cooking website out there, consider this story my long, irrelevant prelude you have to read before you can actually get to the recipe itself. 😉
And without further ado, here it is: The World’s Best Chocolate Icing Recipe. Enjoy!
2. Add softened butter:
3. Add boiling water:
4. Use an electric mixer on medium speed, and beat until smooth (about 3 minutes):
5. Melt dark chocolate either in a chocolate melter, using low power in the microwave, or with a double-boiler. Add the melted chocolate to icing mix and beat the mixture again until silky smoothy (about 3 minutes):
6. Cover with saran wrap and refrigerate for 30 minutes:
7. Take out of the fridge, remove the saran wrap, and let the icing soften at room temperature for about 5 minutes. Then spread generously on cake or cupcakes.
9. A/B test against other icing recipes and share your thoughts in the comments section below. Do you agree it’s the best chocolate icing you’ve ever had?
Special thanks to Emma’s Goodies YouTube channel for originally providing this delicious recipe.
In this follow-up jam-packed 23-minute video interview, Mercer outlines the top three reasons why you need to start playing around with GA4 -- now!
While there are already plenty of resources out there on GA4, this tutorial is specifically for A/B testers optimizers interested in better understanding GA4.
Listen and learn from one of the foremost analytics experts in the field.
Unlock Access. Sign up to become a Pro Member. Get complete access to this helpful content, plus so much more.
Google’s Universal Analytics (UA) is quickly becoming a legacy platform.
In its place, Google Analytics 4 (GA4) is being developed as the platform of choice.
But with this shift, optimizers have been left with more questions than answers and have developed plenty of mis-conceptions in the process.
Step in "Mercer."
Marketing analytics expert, "Mercer" of MeaasurementMarketing provides the answers you need while addressing the top three mis-conceptions of GA4, plus how to overcome them.
In this informative 17-minute video interview, you'll hear the top three mis-conceptions, plus learn:
While there's already lots out there on GA4, this video interview is specifically meant for A/B testers using Google Analytics, Tag Manager, and Optimize.
Listen and learn from one of the foremost analytics experts in the field.
Unlock Access. Sign up to become a Pro Member. Get complete access to this helpful content, plus so much more.
If you're trying to figure out how to best present your product pricing for maximum sales success, The Essential Guide To High-Converting Pricing Strategies is all you need.
In this jam-packed 79-page guide, you'll get the exact pricing strategies and techniques you need to increase conversions and boost profits.
Simplify your marketing efforts while dramatically skyrocketing your revenue and returns.
This guide uncovers the most advanced pricing psychology research, distilled down into easy-to-understand, immediately applicable pricing principles and strategies.
Incorporating real-life A/B test case studies, from big-name brands, this guide gives you many money-making testing ideas and takeaways, helping you understand what works -- and doesn't -- with product pricing.
You'll learn exactly how to:
As a special bonus, you'll also receive a Quick Reference Summary that boils down the most pertinent pricing psychology research into applied, actionable steps.
Additionally, you'll get a helpful Pricing Checklist to inform, and fuel your future winning pricing A/B tests.
Ready to uncover powerful pricing psychology principles that will help boost your sales?
To access, simply click to download the PDF file below. Open, read, and enjoy seeing your sales skyrocket.
Google Optimize is a web-based A/B testing tool that enables you to set-up and run experiments to determine if one element, or version, of your website outperforms another. With this insight, you can create tests that help you quantitatively identify changes to improve your conversion rates and increase revenue.
Google Optimize is fantastic because it's free, highly customizable, and can be easily integrated with other Google tools like Google Analytics (GA) and Google Tag Manager (GTM).
But, because the platform has so many powerful integration capabilities, it, unfortunately, doesn't automatically just run out of the box. Some set-up is required before Google Optimize will track data and run properly.
One of the first steps you need to take to set-up Google Optimize is to create, or obtain, permissions to various containers.
Before jumping too far into setting-up Google Optimize, it’s important you first understand the difference between an account and a container, so you can most appropriately navigate and use the Google Optimize platform.
Here’s an overview:
Account: the top-level element of Google Optimize.
Within Google Optimize, you can see your accounts by going to the “All Accounts” link in the top left corner. Your accounts will display like this:
Within an account, there may be one or more containers.
Container: is like a bucket that keeps all your website domains and experiments. Each container has its own unique Container ID that has an alpha-numeric string.
A container helps you keep everything organized and in one spot.
For example, within this account, there are two separate websites, within two different containers:
As a best practice, everything housed within one website domain should be placed within one container. If your website has sub-domains, it’s up to you if you’d like to make another container for the sub-domains.
Within each container, you can add users and give users different levels of account permission, to either modify, access, or only view the account.
To gain either “edit” or “publish” access, you need to either be an administrator of the account, or you need to contact an administrator who can give you access.
If you're the account administrator, or account creator, you'll automatically assume the highest level of access.
If you're not the account administrator, and, for example, you're running a Google Optimize experiment on behalf of a client/third-party, you’ll need, at minimum, Google Optimize "edit" permissions.
However, if the Optimize account does not yet have a Google Analytics (GA) property linked, you’ll need “publish” access, rather than “edit” access, to link the GA property to the Optimize account.
Therefore, if feasible, it's best to initially ask the account administrator for “publish” access to the Google Optimize account.
Once you've obtained the appropriate permissions, in order to set-up experiments in Google Optimize, and properly track the data, you'll need to obtain Google Analytics (GA) property "edit" access and link your GA account information into Google Optimize.
This linking is necessary to ensure proper goal and data tracking of your experiment, since Optimize itself doesn’t have its own measurement capabilities.
Here are step-by-step video and written instructions for how to obtain container "edit" and "publish" access in Google Optimize. Additional videos are also provided to show how to link provide GA property "edit" access and link GA with Optimize so you can properly run and report on experiments:
In this step-by-step tutorial, you’ll learn what container "edit" access is in Google Optimize.
You'll also be given step-by-step instructions on how to obtain it, or provide container "edit" access, whether or not you're an account administrator.
"Publish" access enables you to link your Google Analytics (GA) properties with your Google Optimize account.
Linking GA into Optimize is necessary to ensure proper goal and data tracking of your experiment, since Optimize itself doesn't have its own measurement capabilities. It requires GA to report data.
To ensure you can link your GA account in Google Optimize, here are instructions showing how your account administrator can provide "publish" access:
If you're able to link the Google Optimize account to the Google Analytics "GA" account, you know you have container "publish" access.
Linking Google Analytics (GA) into Google Optimize is necessary to ensure proper goal and data tracking of your experiment. Since Optimize itself doesn’t have its own measurement capabilities, it requires GA to report the experiment data.
However, in order to have GA interface with Optimize, you need to link GA within Optimize.
To do so, you'll need, at minimum, property level "edit" access in Google Analytics.
Assuming you already have a GA account set-up, here's a short video showing what property level "edit" access is and how to obtain it:
With GA property "edit" access, you're now ready to link GA and Optimize.
In this short tutorial, you’ll see how to link GA and Google Optimize so you can properly track and report experiment data.
Here are step-by-step instructions showing how you link the GA and Optimize accounts:
In case it's easiest for you to follow along with written instructions, here are step-by-step instructions to obtain container access in Google Optimize:
If you need more help getting your Google permissions set-up, reach out to Todd Gamber, Chief Data Engineer at Confidence Interval.
Google Optimize is a powerful, and free, A/B testing and optimization tool. In this step-by-step tutorial, you’ll be walked through the exact process you need to install Google Optimize on a Squarespace site. Note, you must have a Premium plan, or higher to install GTM on Squarespace.
Here are step-by-step video instructions for installing the Google Optimize installation snippet within a Squarespace website:
Google Optimize is a powerful, and free, A/B testing and optimization tool.In this step-by-step tutorial, you’ll be walked through the exact process you need to install Google Optimize on a standard Shopify website site. Note, Shopify Plus is not covered in this tutorial. You’ll learn the process you need to make installation seamless. Plus, how to verify the tracking code is installed properly and working. Considerations:
On April 27, 2020, Google Optimize released a new Java Script (JS) installation snippet. As a result, the method for installing Google Optimize within Shopify changed.
With the updated Google Optimize snippet, this installed method no longer works.
You must now install Google Optimize in the theme header; this method is not preferred because the installation could get lost, or altered, if the theme is modified. However, it’s now the only option, unless you use a plugin.
Here are step-by-step video instructions for installing the updated Google Optimize installation snippet within a Shopify theme:
Google Optimize is a powerful — and free — A/B testing platform.In these step-by-step tutorials, you’ll be walked through the exact process you need to install Google Optimize on a WordPress site. You’ll learn the tools, or plugins you need to make installation seamless. Plus, how to verify the tracking code is installed properly and working.
If your website is on WordPress, you have a couple options you can choose to install Google Optimize.
You can either install Google Optimize:
If you choose to use a plugin, it’s suggested you either go with a Header/Footer code manager plugin, or a paid plugin.
There are many Header/Footer plugins available. The suggested Header/Footer plugin is called Header Footer Code Manager. It can be downloaded for free here.
It should be installed on your WordPress site, prior to watching the installation tutorial.
Monster Insights Plugin
The recommended paid plugin is called Monster Insights. It can be purchased here, and should be activated on your site, prior to watching this tutorial.
Overall, Monster Insights is the recommended installation method because you can easily integrate Google Analytics (GA) and Google Tag Manager (GTM). As well, the Monster Insights plug-in makes installation of the anti-flicker snippet installation much easier.
In order to install Google Optimize on a WordPress site, you’ll need either to have administrative access to the WordPress site, or be in contact with someone who has admin access and can give you access, or can do the installation for you.
Here are step-by-step instructions for installing Google Optimize with the Header/Footer Code Manager plugin, plus how to verify Optimize is installed correctly:
Here are step-by-step instructions for installing Google Optimize with the Monster Insights plugin, plus how to verify Optimize is installed correctly:
Note, these installation instructions show Google Optimize installation using the default synchronous code. You can learn more about the benefits and drawbacks of synchronous code here.
Once you’ve installed Google Optimize, it’s a smart idea to make sure it’s installed properly. You can easily verify the installation by clicking on the “Check Installation” button:
If there’s an issue, you might get an error message that looks something like this:
Depending on the problems found, you might need to contact a Google Optimize, or Google Analytics, consultant, like Confidence Interval, to help you detect and fix the problem.
Or, you might decide it’s best to choose a different method of installation, such as using a paid plug-in, or installing directly through Google Tag Manager.
Assuming the installation is done properly, you’ll see a green check mark that looks like this:
You can also use Google Tag Assistant to verify the installation is done properly. Within Tag Assistant, you’re looking for a green Google Optimize icon, like this:
Here are step-by-step instructions showing you how to verify Google Optimize is installed it’s installed correctly: