Split Testing Blog: 2007

Monday, 7 May 2007

Split testing web pages

Why test your web pages? If you don't, they are unlikely to be as effective as they might be.

As Sumantra Roy says about the art of making landing pages: there's no one right or wrong way and you can't tell what works except by testing.

Split testing web pages is much more complex technically than testing email. It's easy to see how one can create different versions of an email article and send them to subsets of a mailing list, but how do you achieve the same effect on a web page with a single URL?

There are a number of different techniques available but each will generally provide the following:

A way to serve different content to different visitors.
A way to identify visitors who've come to the site before (so they see the same content each time)
A way to record which content was served and whether it resulted in the desired effect (conversion) or not
Statistical tools to analyse the results

The simplest form of test is what is called "univariate": one where only one thing changes. You might for example test only the headline on a page. This is very restricting, but the analysis is mathematically very simple, and you don't need many results to get a reliable conclusion.

The alternative is a "multivariate" test, in which several things can change at once. For example you might have three versions of a headline, two of a product image and four of a call to action. Because of the number of combinations to be tested the analysis is more complex and you may need many more results before a reliable conclusion can be reached.

Some tools will allow you to vary the proportion of tests in which a particular variant is shown: this feature is not particularly useful from an analytical point of view, but if you are testing something radically different from the norm you may wish to limit the risk involved by only showing it to a small percentage of visitors.

How to pick a split testing tool? Don't be seduced by a feature list or pricing: there are three key questions you should know the answer to before you make a choice:

Is this software technically compatible with your site? (Some tools don't play well with dynamically-generated content, for example).
Do the staff who you expect to operate it have the requisite skills? (If your marketing people don't know HTML, for example, is that going to be an issue?)
Will it interfere with search engine rankings? (Constantly changing content can cause problems with search engines, for fairly obvious reasons.)

In the next few posts I'll look into some of the issues around these and other questions, and how some popular split testing tools stack up.

Saturday, 7 April 2007

Split testing email, part 3: what the numbers say

It's been a while since my previous post on this blog but if you read it again, the clue as to my absence is there. "Beware," I said, "there's maths ahead!" Unfortunately when I wrote that I didn't actually have a full grasp of what was involved but I fondly imagined I could quickly mug it all up before posting again.

That proved not to be the case, but now I think I've got it. More or less. Sort of. Near enough, I hope. If anyone spots anything wrong here, please let me know!

Last time, we had three results, each from 666 trials: for headline A, 10; for B, 80 and for C, 90. What we want to know is how useful these results are as measures of the success of the different headlines.

Let's start by assuming that the actual probabilities involved are roughly indicated by the figures. In other words the chance of a click on headline A is roughly 10/666, or 1.5%; of B is 12% and of C is 13.5%.

For each we can estimate the standard error Se by the bootstrap formula (assuming selection from a sufficiently large population that the finite population correction is near enough to 1):

Se = √ ( p * (1 - p) / n )

Thus: Se(A) = 0.0047; Se(B) = 0.012; Se(C) = 0.013;

For 95% confidence, we double the size of the standard error, which means that we can then give the probabilities as being:

p(A) = 0.015±0.0094
p(B) = 0.12±0.024
p(C) = 0.135±0.026

You'll see that while we can be quite sure that p(A) is small compared to p(B) and p(C), we can't be so sure about the relative sizes of p(B) and p(C). The actual probability of B might be as high as 0.144 while that of C might be as low as 0.109. To put it in more concrete terms, if we ran a larger trial we might well find that B was a more successful headline than C.

Of course this is a somewhat artificial example. Most marketers would be very happy with one headline that induced over 10% of users to click on it, let alone two. What really matters at the end of the day, is not clicks but conversions, and those are what we'll look at next time, as we examine the split testing of web pages.

Monday, 12 March 2007

Split testing email, part 2: visits and hits

In my previous post you'll recall we left Charles Farnes-Barnes, our novice internet marketer, anxiously waiting for the results of his email test. He'd created a split test to compare three different variants of a headline in his email newsletter. For the sake of simplicity, we'll refer to them in this article as headlines A, B and C.

The method he'd chosen to evaluate the success of the headlines was to count visits to three different landing pages set up specifically for the test. Something Charles didn't know, but which his canny web expert Hank explained to him, was the difference between visits and hits.

Hits are what a lot of simple counters give you: the number of times a page has been fetched. It's not a very meaningful figure because if you set up a page which your devoted mother opens forty times every day because she has it as her home page, then you'll get forty hits a day even if nobody else ever looks at it. This is one reason amateur webmasters like hit counters: they tend to exaggerate the popularity of a site.

A visit on the other hand is a more scientific measure, in that the analytics software reporting it tries its best to filter out repeated hits from the same source. If it's doing its job, it'll just count your mother once - not very filial perhaps, but fair.

Fortunately the analytic software that Hank uses does count visits accurately, so now let's move on to look at the result he got in this case. The three lists were each taken from Charles's master list of 2,000 email addresses. Each list contained 666 or 667 entries taken at random.

The three results were: for list A, 10 visits; for list B, 80 visits; and for list C; 90 visits. Now, what can one conclude from these figures? Clearly headline A is a poor performer compared to B and C, but what about B compared to C? Can Charles conclude that headline C is better than B, or is the difference simply due to random factors?

In other words, is the difference between B and C significant? That's what we're going to look at in the next episode of this exciting saga, but beware, there's maths ahead!

Sunday, 11 March 2007

Split testing email, part 1

Yesterday I introduced you to Charles Farnes-Barnes, a novice internet entrepreneur who's just begun tentatively exploring the possibilities of split testing. Charles used a simple split test to establish that a landing page improved the performance of his newsletter, and he was so taken with the results that he's now thinking about other ways to employ the technique.

He thinks about his newsletter. At the beginning of every one he puts a featured item about the new lines in stock, under the same headline, "This week's fashion selection". Perhaps that wording is not very compelling? What about "Fresh in this week: hot new fashions!" or "Stand out from the crowd in the latest gear!"? He decides to try out these alternatives using a split test.

You'll remember that when Charles did the landing page test for his discount offer he used discount codes so he could measure which of the two variants was more successful. This time however he can't use the same technique because the headline relates to items that anyone can buy in his shop. How will he be able to tell which customers came from the newsletter, and also which version of the newsletter they received? He's stumped, so he goes to talk to his web designer, Hank Ubernerd.

Hank is a bright chap so he comes up with an answer right away: multiple landing pages. He'll create one landing page for each headline that Charles wants to test. All the landing pages will look exactly the same, except they'll each have a different URL (web address). He'll be able to tell Charles at the end of the week exactly how many customers came to each one, and hence which headline was the most effective.

How will Hank track the number of visitors? There's a number of different ways he can do this but since the hosting service for Charles's web site lets him download his site's access log, he chooses to use that in conjunction with a log analyzer program to identify how many different visitors each landing page will get. There are a lot of different log analyzers on the market but Hank likes Mach5 Fast Stats Analyzer for its flexibility and number of reports. A great alternative for visitor tracking if you can't get hold of the access log is to use Google Analytics, which is an entirely free service and easy to set up (though not guaranteed to be as accurate as a log file analyzer). I'll cover using Google Analytics for tracking purposes in more detail in a later post.

Now all Charles has to do is create his three different newsletters: one with the old headline and the others with the two variant headlines he's going to test against it. Each one is linked to its respective landing page. He takes his email address list and ... uhhhh ... he realises he needs to randomize it for the test. Fortunately Hank is able to help again this time by creating a simple Excel spreadsheet that shuffles the entries. (I'm aware there are better ways to solve this problem but Charles and Hank don't - yet).

Now at last Charles is all set. He sends his newsletters out to his three lists and waits for the results...

What happened next? Wait and see!

Saturday, 10 March 2007

Split testing for fun and profit: an introduction

Introduction

I've created this blog as a place to write about split testing: a subject which, because it brings together marketing, technology and a certain amount of mathematics, combines several interests in one.

Don't worry, I don't intend to delve very deeply into any of these topics, because I haven't the time to write lengthy articles, and also of course I want the largest possible audience for this blog. Too much marketing or maths is liable to turn some readers off.

So what is split testing? Very simply, it's the process of testing marketing materials by experimentally comparing how well different versions perform. Unlike many traditional forms of market research, it's carried out in the live environment, on actual prospects and customers. This is what makes it rather exciting (and a bit scary too).

I hope to take the scariness away to some extent by explaining it in a way people can understand. I'll start by taking a very simple case, that of an email newsletter. I'll continue to use this same example and expand on it in future entries.

Meet Charles Farnes-Barnes of Farnes-Barnes Fab Gear

Charles Farnes-Barnes sends a newsletter to 2,000 people, every week. He uses it to inform his readership about new lines in his online clothes shop and to reward their loyalty with special offers. The special offers link to his website where readers can claim their reward by entering a special discount code from the newsletter when they make an order.

Charles has been reading up on internet marketing and he's concerned he's not doing as well as he might be with his newsletter. He's heard about this idea of a "landing page" - a page which is designed specifically to act as part of a marketing campaign. So he wonders if instead of sending his newsletter readers to the front page of his site he should try a landing page instead, one that reinforces the messages in the newsletter.

But Charles has also been reading about split testing and he's keen to try that out too. So here's what he does: he creates his landing page and at the same time makes two versions of his next newsletter, one with the special offer linked to the new landing page and the other linked to his site front page as before. The two versions also have two different discount codes: that's important for later.

Then he takes his mailing list and divides it in two, so he has two lists each of 1,000 addresses. Since the order of the email addresses on his list is not significant he simply splits it in the middle. (However, every subsequent time he does a test he'll have to divide the list differently, ideally by shuffling it randomly before making the split. )

He sends one version of his newsletter to one list and the other to the other. Then he waits for the results. He decides he'll set a deadline of a week, just before he prepares the next newsletter, to see how the test went. When the time arrives he looks at the total number of each discount code that's been used. To make it easy for himself, he used LANDING as the discount code in the letter linked to the new landing page, and HOMEPAGE in the other one.

Charles gets his results

As it happens. he finds that 200 people used the LANDING discount code while only 100 used the HOMEPAGE one. That's a pretty clear result, a winning vote for use of a landing page, so he decides that's the way to do it in all his future newsletters.

This was a very simple example of what's called an A/B split test, in which two things are compared one against the other. It's a slightly unusual one in that most split testing - as we'll see in future examples - is much subtler in that the difference between the things being compared is not so pronounced. We'll also be looking at how to analyse results when they are less clear cut. Happy testing!

Split Testing Blog