A/B TESTING MASTERY

Stenburgen Ruwa
4 min readSep 21, 2020

History of A/B Testing

It all started way in 1995 where, for people in tech then did A/B testing by digging deep in to log files to understand real behavior going on the website. There after they made some changes after that they start measuring again to see if there is any difference.

Around the year 2000 Meta description redirects, which only happened if the browser you are using was compatible JavaScript redirection. Back then if you visited a website, it took around 4 seconds to load. Immediately after that load time you would hear, “click!” then you get redirected to a different URL which presented you with version B of the website. During this period cookies were not being used at all in the sense that if you re-enter the experiments, it could be in 50% of locations, you would end up in the wrong variation.

In 2003 real A/B tools Offermatica, Optimost, Memetrics, very expensive enterprise software solutions came into the market place. Through the introduction of these tools people then started using cookies that helped them have a proper randomized control trial.

Back in 1995 a randomized controlled trial was already the way to go if you wanted to understand medicine was working or not. During the experiments they took tow patients and did a double blind in this case the patients didn’t know if they were part of A or B which was the same case for the researchers who didn’t know if they either had group A or B

Google optimizer created by google, was a free solution that could do JavaScript redirect. The marketers back then also found out way to inject codes on webpages hence they were able to optimize the webpage on the client site. The google optimizer was the first affordable A/B testing experiment tools in the market back then.

In 2010 VWO and Optimizerly came into the market place. They created a drug and drop solution for A/B testing, which every marketer could logon, drag and drop something, press start and run the experiment. Despite so many people entered the A/B testing scene as a result of cheaper tools they made a lot f mistakes.

As of 2019 A/B testing has become a mature market. Looking back in 1995 we started with log files file analysis during comparing weeks, then we went on to redirect scripts but not using cookies then we started using cookies. Then in 2006 a real optimizer tool was introduced by google, to run experiments on with a drag and drop interface. In 2016 goes to frameworks, personalization AI, and all the way to server-side A/B testing experimentation in 2019.

When conducting experiments, the best way to make better and trust worth decisions, there is only one way to go is to apply randomized controlled trials. You should apply A/B testing in your company to make sure you are making the right decisions and speed up and have trustworthy decision and not just expert opinions.

A/B testing is used when doing a research which is divided in to conversion signal map. Through the conversion signal maps you use positive, flat line and negative variation to determine the impact of the website. To be able to understand if an experiment is making an impact you have to optimize in the client side inform of lean deployment to see if your experiment is working before fully implement it. An experiment that turns out to be the winner is the one that is eventually deployed.

Before you conduct A/B Test you have to make sure that you have enough data. The success of Amazon according to Jeff Bezos directly correlated to the number experiments they conduct in a year, a month, a week and a day.

The ROAR model is a rule of thumb when you can run how many experiments. It has four optimisation faces. The first face is risk, phase two is optimization, phase three is automation and phase four is the rethink phase, especially if you find yourself in a situation when your business declines or goes down. It is advisable that you don’t conduct an A/B test if you your conversion per month are below 1000. In this case you will hardly find a winner since you are low on data.

Statistical power if the likelihood that an experiment will detect an effect when there is an effect to be detected. So, during your experiments if you have created something good as a challenger, in reality something that makes an effect. You have to make sure that it can be detected when you conduct and A/B test. Statistical power depends on the sample size, effect size and the significance level.

Significance is when you test a high enough significance level (90% or 95%). If you are doing 100 experiments and test on 90% significance level, it will just be the same as doing and an experiment of A versus A, in other words a control experiment versus the same control experiment.

You will find that when you are doing the 100 experiments, you will get 10 winning outcomes on the 90% significance level. Unfortunately, this are not winners since they are the same control versus control, also known as false positive. They are measures as a win but in reality, they are not a win. So if you lower the border to 90% significance level to like around 80% or 70%, your numbers of false positive will obviously go up.

Power and significance rule of thumb.

With power when you start an A/B testing, try to test on pages with higher power like 80% or more. In this case you have a big chance of finding a winner if there is a winner to be detected, otherwise you don’t detect an effect hence too many false positives.

Under significance when you start, try to test against high enough significance level like around 90% , otherwise you will declare winners when in reality there isn’t an effect, which are also known us false positive. As a starter stick to 80% power and 90% significance level in order to start running an A/B Test.

--

--