Out now: Value-Based Design, the definitive way to prove your design’s worth. Read it.

Similar Hypotheses & Retesting

 

Let’s say you’ve run an experiment in the past. No matter how it’s fared, a few months later, someone wants you to run something very similar: a reworded headline when you’ve already tried 6, or a reworked pricing model when you’ve already settled on a (possibly local) maximum.

How do you prioritize and execute experiments that look like something you’ve done in the past? Should you?

What kinds of experiments are we talking about?

Under what circumstances would you run into a situation where you re-run a materially similar test?

There are 4 different possibilities:

Similar experiment on the same element

For example, you’re running a new headline test on the same pages as before, just with a different headline.

Same experiment on a different page

For example, you’re testing a previously-winning call to action on a different landing page.

Same experiment on a different product

For example, you found that a testimonial works best as the headline for a product. But your other products have other testimonials, of course. So you want to try new testimonials on different products: materially the same experiment, but with a couple of variables shifted (new product, different testimonial).

Identical experiment at a different time period

For example, you have a test that happened to run during a sale period, and now you want to run it a second time, during a period when no sale or holiday is occurring.

None of these are unreasonable

All of these are reasonable circumstances under which you’d want to run a similar test. The overall success of a variant means you should want to dive into specific: what works for what segments, and why. Greater personalization is generally a win for businesses, because you’re better able to anticipate each customer’s real needs.

That said, these can be proposed regardless of the previous experiment’s outcome: after all, a loser during a sale could be a winner when the sale isn’t running.

What matters, then, is whether it actually makes sense to run another version of a materially similar test. And that’s when you run new tests through a prioritization framework.

Let’s go through each of the parameters that we measure for a new experiment, and see how a past test would fare.

Parameter 1: Feasibility

Tests like this are pretty easy to run, because you’re effectively duplicating the work you’ve done on the previous test. So feasibility should score high, probably at least a 9 or a 10.

Parameter 2: Impact

This is the score that varies significantly, and is the most contingent on the results you got from the previous test. Are you running this test again because it’s on a high-traffic page, or because the previous test yielded an outsize result? If the latter, then you should probably provide a much higher impact score.

Remember that high-impact tests in either direction, win or lose, are important – because it means you’ve hit on something that really matters to customers. If you had an inconclusive dud of a test, it may be that you’re about to run a test that doesn’t even matter.

Parameter 3: Business Alignment

It’s unlikely that this parameter has changed significantly from the previous test, since you’re running something so similar.

If you’re segmenting to a narrower group of customers, you might want to score it a little lower – unless optimizing for that group has already been defined as a significant focus for the business.

If your re-run test is on a product or page that gets less traffic, or fewer conversions, than the original test, you may want to score it lower as well.

Vet this experiment’s score against your existing queue

Once you’ve scored your experiment, it’s just another experiment. You may have been excited about the previous experiment’s results, but you need to stay fair to the process of optimization – because it could be that you’ve come up with other experiments that actually matter more to the business and your customers.

Prioritize your experiment accordingly, and make sure that you’ve come up with solid reasons for why you settled on each score. That way, you can prioritize tests that may be more important to the business – and more likely to move the needle.

But under what circumstances should you run the exact same test – same change, same page?

When you shouldn’t re-run a test

You mostly shouldn’t do this, to be clear. Retesting is the literal definition of insanity: doing the same thing twice and expecting different results. If you’re operating at statistically significant sample sizes, you should get the same result, with roughly the same confidence interval.

It’s generally a waste of testing time to retest. This tutorial exists to teach you the limited circumstances under which you might want to retest. Here’s why you don’t:

If the variant didn’t win

Oh, you feel bad about the variant losing? You put a lot of work into it? Boo hoo. If you ran the test properly, then your variant lost, plain and simple.

Acknowledge the reality of the situation, brush it off, learn from it, and move on.

If the primary metric had a high standard deviation

Did you get a minimum sample size for the test, but your primary metric has a high standard deviation and low confidence? That means the variant ran neck-and-neck with control, not that you didn’t gather enough data or the data was somehow of lower quality. Re-running the test is likely to give you similar results.

If there were knock-on effects that raised any concerns

Did your primary metric win, but your secondary metrics suffered? If those metrics also met the requirements for minimum sample size, then you can confidently assume that there’s a trade-off built into your design decision.

Re-running the test will not give you any new insight. Instead, you need to talk with your team and determine what the right course of action is to take. Remember that testing exists to evaluate whether you should roll out a design decision. The decision is either made or not made.

If you want to run a similar experiment with something different, that’s a path you can take as well.

When you should re-run a test

The overall reason you shouldn’t run a test? It won’t teach you anything new.

So, when will re-running a test teach you something new?

If the sample size wasn’t large enough

Sample size is an estimate based on your volume of future traffic. If you run a test for a month and don’t meet sample size requirements, and you know that you will be able to reach minimum sample size in a re-test, then you should re-test – not restart the same test and gather new data.

Why can’t you restart the same test? Traffic might be significantly different now, and it is likely to have substantially different behaviors when considering whether to convert.

That brings us to our next point:

If traffic sources have significantly shifted

If your ad spend has shifted substantially, or you’ve moved more to selling over email instead, you might want to re-run a test with your new traffic source in mind.

That doesn’t give you license to re-test everything, for the record. You should only re-test something if you believe that the hypothesis will shift significantly under the different traffic volume.

When the test is over 6 months old

If the test is significantly old, and you believe you’ll get genuinely different insight out of running it again, then you might want to rewrite your hypothesis and kick it off anew, for the same reason as above: traffic sources might shift, buyer expectations change, and competitors enter the market.

I would only re-test an old test if you can clearly point to one of these events happening, though. If you just want to re-test old stuff for no good reason, you’re unlikely to make the best use of your testing time.

After a redesign

This is, in fact, a different sort of test, since you’ve just radically changed the control. You might want to re-test your biggest winners (and losers!) to ensure that any new insights are incorporated in your updated design.

If the previous test involved a bug in control, variant, or both

You should re-run tests that were buggy, since you didn’t get accurate data from them.

Wrapping up

You should not usually re-run tests. You should only re-run tests if you are absolutely sure that you will get new information from them. Otherwise, you should focus on drawing new insight from optimization that will bring additional value to your store.

← Back to the Blog Check Out Draft’s Store →