Saturday 7 April 2007

Split testing email, part 3: what the numbers say


It's been a while since my previous post on this blog but if you read it again, the clue as to my absence is there. "Beware," I said, "there's maths ahead!" Unfortunately when I wrote that I didn't actually have a full grasp of what was involved but I fondly imagined I could quickly mug it all up before posting again.

That proved not to be the case, but now I think I've got it. More or less. Sort of. Near enough, I hope. If anyone spots anything wrong here, please let me know!

Last time, we had three results, each from 666 trials: for headline A, 10; for B, 80 and for C, 90. What we want to know is how useful these results are as measures of the success of the different headlines.

Let's start by assuming that the actual probabilities involved are roughly indicated by the figures. In other words the chance of a click on headline A is roughly 10/666, or 1.5%; of B is 12% and of C is 13.5%.

For each we can estimate the standard error Se by the bootstrap formula (assuming selection from a sufficiently large population that the finite population correction is near enough to 1):

Se = √ ( p * (1 - p) / n )

Thus: Se(A) = 0.0047; Se(B) = 0.012; Se(C) = 0.013;

For 95% confidence, we double the size of the standard error, which means that we can then give the probabilities as being:

p(A) = 0.015±0.0094
p(B) = 0.12±0.024
p(C) = 0.135±0.026


You'll see that while we can be quite sure that p(A) is small compared to p(B) and p(C), we can't be so sure about the relative sizes of p(B) and p(C). The actual probability of B might be as high as 0.144 while that of C might be as low as 0.109. To put it in more concrete terms, if we ran a larger trial we might well find that B was a more successful headline than C.

Of course this is a somewhat artificial example. Most marketers would be very happy with one headline that induced over 10% of users to click on it, let alone two. What really matters at the end of the day, is not clicks but conversions, and those are what we'll look at next time, as we examine the split testing of web pages.