Multivariate Testing


Let’s say you want to vet three different changes to a given page – let’s call them changes A, B, and C – but you want to change the effect of each permutation of a change. Multivariate testing builds a series of variants that all get weighed against control at once:

  • Control.
  • Change A.
  • Change B.
  • Change C.
  • Changes A + B.
  • Changes A + C.
  • Changes B + C.
  • Changes A + B + C.

And then figures out the best-performing one.

Multivariate tests are terrific for quickly weighing mutually exclusive but thematically related changes to many different portions of a page. They’re also great for generating lots and lots of variants, and settling on one that makes the most sense.

If you’re making one change, you have two variants (control and your change). This is a basic A/B test. But as you add changes to a multivariate test, the total number of variations (including control) then increases by a factor of 2n. In practice:

  • 2 changes: 4 variations (control, A, B, A+B).
  • 3 changes: 8 variations (which is what we did above).
  • 4 changes: 16 variations.
  • 5 changes: 32 variations.
  • 6 changes: 64 variations.

And so on. You probably don’t want to run a multivariate test with more than this many changes to a site, as the number of variations balloons from there. For example, if you’re making 69 changes on a multivariate test, you’re dealing with 5.90*1020 variations(!) in your test. Your variant count should not be measured in moles.

Multivariate testing is different from multiple-variant A/B testing in that you’re vetting every permutation of a set of changes. In a typical A/B/C/D test, you could be testing three wildly different variants against a control – which would not be able to be mixed and matched. Think of it as the difference between an A/B/C/D test and an A/B/C/D/BC/CD/BD/BCD test.

Multivariate tests are very powerful.

What kinds of tests are tests are appropriate for multivariate testing?

Of course, the changes you make in a multivariate test must be mutually exclusive from one another, or you won’t be able to mix them effectively. So, that’s one constraint. What kinds of multivariate tests are appropriate, then?

  • A marketing page that’s broken up by modules. This is a very common interaction model, and you’ve seen it before: pages that have 100% width, full bleed modules as you scroll down the page. If you already have a page that looks like this, it’s great to make edits to specific modules, or groups of modules, in a more complex multivariate tests.
  • Multiple images & videos. Want to settle on a masthead image for a high-traffic blog post that leads to a purchase form (or mailing list CTA if and only if you can clearly measure the value of a single mailing list subscriber)? Multivariate tests are great for vetting the positioning, treatment, and content of a given image on any page – as well as the potential combination of many images, videos, etc.
  • Combining text with visual elements. Is text, image, or video appropriate in a given location? Does it vary from page to page? Multivariate testing helps you settle on the best-performing combination for every page, especially on blog posts.
  • Multiple instances of social proof. Can you measure your numbers of testimonials, corporate logos, and press outlets in scientific notation? A multivariate test can help you settle on the correct number of each, as well as the content thereof.
  • Color schemata. As a designer, I am personally highly unfond of mixing and matching color schemes. As an A/B tester, I find that very, very few color-based tests result in any significant revenue generation for any kind of business. But people do it, and multivariate testing may help you settle on one that works better. For example, if you have a handful of CTA buttons scattered across a long sales page, it may be useful to run a multivariate test that changes each button – assuming you have enough research to support the overuse (or lack of use, or wrong use) of the buttons you already have, of course.
  • Navigation links. I love running tests that remove any navigation links that aren’t solely focused on revenue-generating conversion. With a multivariate test, you can settle on the most optimal configuration of inessential navigation links.

I’m not saying that any one of these will generate revenue for your business, of course; the only way to understand what tests will generate revenue for you, in your specific circumstances, is to carefully research it.

What kinds of changes aren’t appropriate as multivariate tests?

Most of them, alas:

  • Mix-and-match value propositions. In order for your product to sell well, it needs to have a clear value proposition, period. You can’t propose a smorgasbord of value propositions in a multivariate test and expect to have it work for your business. It’s much better to research a specific value proposition more carefully by asking paying customers what they really want, and then test that as a single variant later.
  • Multi-page tests. I love running tests across multiple pages of a funnel, but these are not so great for multivariate tests, because you end up with many different entirely separate funnels. Imagine the development cost in maintaining all that!
  • Self-contradictory interaction models. This is going to sound terribly obvious, but you need to make sure that your page is interactionally consistent, and that a multivariate test is capable of supporting the broader context that it exists in. I’ve seen multivariate tests settle on a checkout form that may contradict what’s been mentioned on a given pitch, or even on one’s terms & conditions page: shipment to certain countries, specific discount codes, etc. Remember that paying customers are receiving your test pages. Even if you haven’t settled on a variant yet, you need to make sure that you’re able to provide what you promise.
  • Headlines & copy. These sorts of tests are possible, but I am wildly unfond of mix-and-match copy tests. Why? Because people focus on the text you write more than any other component of the page – no matter how pretty it may or may not be. Towards that end, there is simply no good substitute for copy that makes contextual sense. Multivariate tests on copy might be good to get a broad-strokes baseline, but then you’ll almost certainly need to edit what you have in order for it to be coherent. (And I am personally against A/B testing headlines in a journalistic context, as they may create a dangerous double meaning on a given article.)
  • Any overlapping content. Want to change two sets of content that overlap in any capacity? Yeah, you simply can’t do this in a multivariate test – because of the way you’d end up mixing variants in the final product.
  • Radical reworks. There just isn’t a good way to do this as a multivariate test. You could have a modular page where you test changes of every single module – but that doesn’t change the fact that you have a page with modules, in a certain particular configuration, in the first place.

In short, if it’s tough to implement as a multivariate test, run a regular test instead. This is not a situation where you want to cram a square peg into a round hole. It will end in sadness.

How do I run a multivariate test?

Your framework should have the ability to run a multivariate test by default. VWO lets you create a multivariate test on the first page after you say you want to create a test, with multiple variations on single elements. Optimizely lets you create a multivariate test right off the bat, by changing the experiment type.

Follow the steps on each of those pages, and you should be able to run a multivariate test that works best for your framework.

What should I keep in mind as I take the plunge into multivariate testing?

First and foremost, multivariate tests are only appropriate for people who have truly bonkers quantities of wallet-out, qualified, revenue-generating traffic. Just as you can’t run an A/B/C/D test without twice the amount of traffic to get statistical significance, a two-change multivariate test runs the exact same way. Remember that you must always calculate sample size before you kick off a test, and calculating sample size is just as important for tests that are more complex in nature.

Next, trust your framework. Frameworks know how to ration out traffic, and good frameworks know how to calculate sample size ahead of time. (VWO in particular does this quite well.) Don’t mess with your framework’s parameters unless you have a very good reason for doing so.

Finally, multivariate tests are terrific candidates for running bandit experiments, because of the high quantity of variants and high likelihood that many of them won’t work out well. Bandit experiments are a type of test where your variations’ traffic share increases or decreases based on their success. Remember to use a less greedy bandit algorithm to reduce the risk of killing off potential winners.

← Back to the Blog Check Out Draft’s Store →