VKI Studios is now Cardinal Path! www.CardinalPath.com
Learn more about Cardinal Path

Google Website Optimizer: When is it Advisable to End a Test Early?

Back in January, I wrote a post about the dangers of rushing to judgment based on small numbers. I recommended that even if GWO has declared a winner, it's best to let experiments continue to run at least two weeks and gather at least 100 conversions per version. Today I'm going to play devil's advocate and argue that in some cases, you might just want to pull the plug earlier!

We're currently running an A/B test for a lead-generation website. We believed that the original version of the page had lots of room for improvement, so we were pretty confident we could boost conversions.

GWO very quickly confirmed our suspicions: four days into the test, GWO declared a winner. Our version B was outperforming the original by 139%.



We urged the client to keep the test running, for the reasons discussed in my earlier post. They agreed and the test is still running. In the past few days, the observed improvement has fluctuated, but it's clear the new page is better. It's just a question of how much better.

My normal inclination would be to keep the test running until the numbers settle down. But there is a serious potential downside to doing so: by continuing to show the losing version to half of our client's visitors, our client is potentially losing sales. The longer we keep the test running, the more our client is potentially losing!

So... do we keep the test running until we get more precise numbers? Or do we stop the test now, take full advantage of the improved performance of the new page, and move on to the next test?

I'd like to suggest some guidelines as to when it's better to end experiments earlier than normally recommended. For example, if all these criteria are met, perhaps it's better to stop the experiment:

  • GWO has declared a winner.
  • The results, though early, indicate a very large difference in performance between pages;
  • There is no reason to doubt the early results (i.e. the large performance difference is not unexpected);
  • There is no reason to expect that seasonal or day-of-week/month factors may have skewed results; and
  • Each conversion has a substantial monetary value (i.e. there's a good chance that keeping the experiment running is costing the client money).

It seems to me, the bottom line is this: though GWO utilizes the scientific methodologies of A/B and multivariate testing, its purpose is for marketing, not pure science. We need to know which page performs better, but we don't necessarily need to know precisely how much better.

Keeping an experiment running can be costly. Sometimes it's better just to pull the plug early and move on to the next test.


Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Great post -- and a question that I hear alot. Even if you're pretty sure of the "winning" alternate version, another option is to keep the control running, but downweighte so as to minimize potentially lost conversions. Even with a lower % of assigned users to the control, you still continue to collect data that can reveal whether the early winner's trend continues or not. In your example, you could have run the test for 4 days, then successively downweighted the control to 40%/30%/20%/10% (of overall traffic) over the next 4 days, giving you a full 7+1 days to dampen any potential day-of-week bias.

Eric
# Posted By Eric Hansen | 4/18/09 2:21 PM
Good post.
If the change is putting "best practice" into action, and the numbers back this up then you need to end the test and make the new version live.
Then you can start testing some more changes - again using good practice and what has worked elsewhere.
# Posted By John Hyde (Christchurch New Zealand) | 4/29/09 3:13 AM
.