How to use A/B tests to improve your email campaigns every week

Nora Landis-Shack on May 14, 2014

“Hey.”

That’s the subject line that helped Obama raise 690 million dollars.

But how could Obama’s team know it would work?

It wasn’t because they had “one weird trick“. They were using a tried and true method:

A/B testing.

By testing the subject lines, layout, and content of their email marketing campaigns, Obama’s team was able to dramatically increase results from their fundraising efforts.

Big companies have people dedicated to just running A/B tests because this stuff works. That doesn’t mean that smaller companies can’t do A/B tests.

In fact, I might go so far as to say:

You need to be A/B testing.

What is A/B Testing?

For the folks who may not know, and as a refresher for those who do, A/B testing (also called split testing) is a method that helps you test a hypothesis about human behavior.

In an A/B test, people are split at random into two groups. For simplicity’s sake, let’s say they’re split 50/50 into each group (though it can be 45/55, 30/70, etc.).

Without being told, the users in both groups are presented with a stimulus: the original condition, known as the control, or a variation (what you’re testing). Then you compare the performance of the variation against the performance of the control.

If the new variation is more successful—because it drives more clicks, conversions, engagement, etc.—you can replace the old control with the variation and use it as the new control for future tests. If the variation does not outperform the control, stick with what you have and try a new test to improve on your results from the control.

An A/B test has two main benefits:

It gives you data to back up what you think might be successful.
It’s probably the cheapest way you can improve your sales or conversions.

A/B testing can be used on almost anything you can think of: your design, email copy (in the body or subject), calls to action, etc.

A/B testing helps avoid the correlation = causation trap

You might have heard the phrase “correlation does not equal causation”. Let’s define both of these:

Correlation is when two data sets are strongly linked together (i.e. as one increases, the values of the other increase with the same proportions).
Causation is when one data set causes a reaction in the other.

“Correlation does not equal causation” is the principle that while two data sets might appear to be linked, a change in one doesn’t necessarily cause changes in the other.

Here is my favorite absurd example:

Source: Spurious Correlations

While mozzarella and engineering degrees correlate, there’s no real way to prove that cheese consumption actually leads to more engineering degrees.

If you made that argument based on this graph you’d probably convince a handful of people, but reasonable people would likely see that there’s no proof one causes the other.

One common mistake people make while testing is assuming their change has an impact without data from A/B tests to back it up.

Here’s what happens when you don’t use A/B tests.

You make a change that you think will have an impact. If your conversion rate goes up, you assume that your change is the catalyst. But you have no way to prove it.

If you set up a test beforehand, you can measure the effect of your change against your control. That way, if the change outperforms your control, you have visible proof of what works and what doesn’t.

How to make sure your tests help you reach your goals

Now that you know what an A/B test can do for you, it’s time to set one up.

As simple as A/B tests are, a lot goes into making them successful. You can run A/B tests all day, but if you don’t run them properly, you’ll get nowhere in terms of your long-term goals.

Here are a few best practices to make sure you’re not wasting your time and efforts with your A/B tests.

Spend some time figuring out what your test’s goal will be. Going blindly into testing will only waste your time.

Do some research to see what other companies have done (and whether it’s improved their emails) and then evaluate if that change is right for you.

For example, even if changing the color of your call to action leads to a 1.4% increase in conversions, unless you’re the size of Google or Amazon this might just mean a few hundred dollars in sales. Certainly nothing to sneeze at, but it’s worth evaluating whether your time spent on smaller changes (vs creating a new variation from scratch) is worth the outcome.

Only change one thing at a time. For example, don’t test two different subject lines and two different layouts at once. This screws up your control condition because you’ll have nothing to compare your changes to. You won’t know if your subject lines caused the effect, or the copy. Pick one thing to test against a control; you can always run another test after you get the results from the first one.

However, those small changes will only take you so far. Eventually you’ll reach a “local maxima” and get to the point where in order to make a dent in your numbers, you’ll need to start from scratch.

A successful A/B test MUST be statistically significant

Reaching statistical significance means that the results of your test aren’t just due to random chance, they’re actually due to the changes you’ve made.

It’s shockingly easy to get results that are due to random chance. A great way to see this in action is running an A/A test using just your control. An A/A test works exactly the same way as an A/B test, but gives the same experience to both groups. It serves as a test to see how much noise exists from underlying natural variations. The amount of noise from natural variation will affect how you determine what statistical significance will mean for your test.

To make sure you’re reaching statistical significance, you have to calculate your correct sample size beforehand and run your test until that sample is reached.

If you end a test prematurely, you run the risk of concluding that an effect or relationship exists where it actually doesn’t. These false positives are a common mistake, and if you’re not careful, you might choose a winner that actually has worse performance over the long term.

In Customer.io, we represent statistical significance as ‘Chance to Beat Original’. We assess whether the difference observed between the control and the variation is greater than a difference due to random chance.

If the ‘Chance to Beat Original’ (CTBO) is at 50%, that means that the variation will outperform the control 50% of the time. Since this is the same as random chance, a CTBO of 50% means there is no difference between the control and variation. The closer you are to 50% (i.e. 40% or 60%) the less significant the difference.

The further your CTBO is from 50%, the more likely there is a significant difference between the control and variation. To determine whether the control is beating the variation, or vice versa, Customer.io uses a significance level of 95%. So:

if the CTBO > 95%, your variation is outperforming your control
if the CTBO < 5% the control is outperforming the variation.

If your CTBO is between 5% and 95% it doesn’t meet our threshold for statistical significance.

We have such a high standard because no matter how many A/B tests you run, if you don’t have true statistical significance you’re wasting your time and effort.

When done correctly, A/B tests are a great way to improve the performance of your emails. A/B tests make it possible to move towards your long-term goals, ensuring each change along the way has a positive impact on your success.

And if you’ve had any successful A/B tests or results you’d like to share, let people know in the comments below.

Happy Testing!

P.S. If you want to set up and run A/B tests on your behavioral emails in Customer.io, we have a complete walk-through in our documentation for campaigns and newsletters. You can also learn how we calculate our CTBO (and how to calculate your own) by visiting our “Understanding Your A/B Test Results” page.

Back to Customer.io Blog