New: Draft Membership. Level up your store’s design game – and profit. Read more.



How do you know what to test next? Every optimization program has a common thread: making sense of all of the design decisions you could test, and sorting them in an order that makes sense for the business. You need a way to sort new design decisions consistently. This allows individual egos to take a back seat, flattening the org chart and embracing rationality in the process.

Obviously, we have our own methodology in the Draft Method. Here’s what you can do to prioritize new test ideas quickly.

First, ignore any ideas that are not – or can’t be – supported by research. We’ve mentioned this over and over in the past, but researched test ideas always outperform vague guessing.

The Draft Method accounts for this by making research a prerequisite for any test idea. If an idea hasn’t been researched, you need to follow the hunch and confirm it through analytics, heat maps, usability tests, or customer interviews.

There are 3 parameters that you should assess for every test idea, scored from 1 to 10:

Parameter 1: Feasibility

First, how easy is it to build the thing? Note that high scores are good in our prioritization method, so we’ll be starting from 10 (representing the easiest possible implementation) and working our way down to 0.

If you’re using a framework to build it yourself

The easiest tests are the ones that you can build yourself in your experimentation framework.

  • If you’re making strictly copy changes or removing semantically-scoped elements, score a 10. You can do this yourself in your sleep.
  • If you’re rearranging elements, incorporating different sets of changes for mobile & desktop, or writing any CSS to hide or unhide specific features or dynamic functionality, score a 9.
  • If you have to incorporate JavaScript in your prototype, score a 8.

If you’re working with developers

If you need to enlist others to build out any server-side functionality, start at 8, and subtract a point for each of the following that’s true, stopping at 0:

  • Separate versions need to be built for mobile & desktop.
  • Dynamic functionality needs to be built for every product in a collection page.
  • Filtering or sorting elements are affected in any way other than removal.
  • New standard-issue elements are being added to a page.
  • New scarcity plays are being added to a page.
  • Simple dynamic functionality, like a countdown timer to a static time, is being added to a page.

And subtract two points for any of the following that’s true:

  • Specific features need to be rolled out for wildcard pages, such as different copy for every single product detail page.
  • A high-level framework needs to be built out, such as a feature flag.
  • Elements are affected that have direct functional ramifications for the customer’s transaction (such as the add to cart button, upsells, etc), mandating more QA before launch.
  • Multiple pages need to be adapted to handle edge cases or address a consistent customer experience.
  • Significant portions of technical debt need to be navigated.

Don’t throw away low scores

With our other two prioritization metrics, we recommended throwing away any design decisions that score at a 5 or less. With this metric, though, you may encounter difficult implementations that remain high-reward. Impact and business-strategic fit matter more than the difficulty of implementation.

As a result, you should keep all ideas, even ones that happen to score a zero. It may be that you find an easier way of doing it in the future, and it may be that building it out actually helps. High risk ideas can still be high reward.

Parameter 2: Impact

Impact is the hardest and most important metric to measure, and it’s also the highest priority when vetting new experiment ideas in the middle of a downturn.

Now more than ever, you need to maximize the win rate of your experiments. Now is not the time to play fast & loose with your experimentation framework. You don’t have the time or the luxury.

Set 1: add a point for yes, subtract a point for no

  • Will over 80% of customers see the change?
  • Does the change apply to a segment of customers that converts at least 1/3 below the store average?
  • Is the change mobile first?
  • Is the change significant enough to change over half of the page’s elements?
  • Is the element located above the fold on mobile?
  • Does the element map directly to customer motivations?
  • Does the test occur on a page that’s within the conversion funnel (home, collection, product, cart, checkout)?
  • Does the change specifically map to a purchasing decision?

Set 2: add a point for yes, do nothing for no

  • Does the element promote scarcity?
  • Does the element promote increased AOV (e.g. upsells, add-ons) or CLTV (e.g. subscriptions)?
  • Does the element simplify the page?
  • Is the element noticeable within 1 second on average on behavior recordings?
  • Has a previous experiment been run on this element?

Set 3: add a point for yes, subtract 2 points for no

If you answered “yes” to the previous question, did the experiment succeed or fail? (Subtract 2 points if you got a null result. Don’t answer this question if you answered “no” to the previous question.)

Here’s how you score this:

  1. Start with 5 points.
  2. Go through each group of questions, and add & subtract points accordingly.
  3. Discard the idea if you score below a 7.
  4. Prioritize it like you normally would if you score a 7 or above.

Parameter 3: Business Alignment

Finally, how closely does the decision jibe with your business’s long-term strategy?

First, you need a strategy

Optimization isn’t for businesses that lack a deliberate strategy. If you’re just playing defense, or if your strategy shifts day to day, then you’re probably reading the wrong blog.

That being said, as of this writing, we’re in the middle of a major shift in the economy and society. I think everyone gets a pass when changing their strategy. Heck, Draft is changing our strategy. So now is a great time to be conceiving of a strategy if you haven’t done so before – or reworking a strategy that you might have set during a bygone time.

Let’s talk about how to do that.

How are you adapting to the new normal?

If you aren’t changing right now, your business is headed for the graveyard. Write down 5 things that your business is doing, or hopes to do soon, to adapt to the current state of society.

What products are you releasing over the next year?

If you’re planning on releasing anything new, write down everything that you’re planning on releasing between now and this time next year.

For every product you write down, also write down what you will need from the business with respect to messaging, marketing, traffic generation, and information architecture.

How are you addressing competition?

Since competitive analyses are vital components of any research routine, you’l probably want to incorporate any response to competition into your business strategy.

First, if you haven’t run a competitive analysis yet, you need to do so. It’ll take about an afternoon.

Then, come up with your 3 biggest competitors, and lay out specific plans for addressing each one. Are you shifting your messaging? Targeting different customers? Researching new design decisions for their appropriateness to your store?

Are you entering any new markets?

Obviously, your business is contingent on the market that you’re serving. If you’re planning on shifting focus to serve a different group of people anytime in the next year, write out who you currently serve, who you plan on serving, and any corresponding changes in tactics that might necessitate serving them better.

Scoring new test ideas

Once you have your strategic plan together, compile every component of your strategy together into a list. For each new decision, start at a score of 5 out of 10. Scoring goes as follows:

  • Add 1 point if the decision supports the strategy.
  • Subtract 2 points if the decision goes against the strategy.
  • Do nothing if the decision isn’t applicable or could go either way.

Here, you’re subtracting more than you’re adding as a check on rash decisions that might go against your business’s long-term strategy. If you want to be even more cautious, subtract 3 points, instead.

Cut any decisions that score a 4 or below, and sum up the rest with your other prioritization scores.

Add ‘Em Up

You now have an aggregate number from 0 to 30.

At this point, I usually throw away any tests that score below 10; there’s always lower-hanging fruit to be found, even if it requires more research from the team.

And then I sort the rest, going slowly down the list from 30 to 11 – and always trying to find more high-ranking tests in the meantime. Tool-wise, Trello – or some similar sort of kanban board – is a great way to manage the order of tests. Here’s the template we use at Draft for all of our clients, if you need a place to get started.

Prioritizing one-off changes

In short, we recommend prioritizing experiments by three parameters:

  • Feasibility
  • Impact
  • Business alignment

But what about one-off fixes? Do they get prioritized in the same way?


Feasibility & impact still matter, but over time we’ve come to decide that a third metric matters more than business alignment when it comes to one-off fixes.

Why? Because fixing bugs aligns with business strategy. After all, on what planet will you encounter an issue on Android and hear “no, that’s not part of our strategy, let’s keep it”?

So let’s talk about what else to focus on, instead: context fit.

What is context fit?

I spent 34 pages describing context in my first book, so that’s the long answer.

The short answer is that context fit is defined by whether a new or changed element stylistically connects to the elements around it. Some elements should stand out – in a way that is consistent.

For example, any calls to action that move a customer to the next step in the funnel should be a distinct, consistent color from step to step, such that it isn’t used anywhere else in the store’s layout. Contextually they stand out, but for a good reason.

Determining context fit

Here are the questions we answer at Draft to determine context fit. Start with 10 points.

  • Does the changed element use the same styling as the rest of the layout? If no, deduct 2 points.
  • Does the changed element use the same typography as the rest of the layout? If no, deduct 2 points.
  • Does the changed or new element make a substantial change to the layout across devices? If yes, deduct 2 points.
  • Does the changed or new element make a substantial change to the interaction model? If yes, deduct 2 points.
  • If applicable, does the copy in the element fit the voice & tone of the rest of the business? If no, deduct 2 points.
  • If significant changes are being made, what is the conversion- or usability-focused justification for them? If there is significant justification for each deduction, add 1 point.

Your final score is out of 10. Add this to your scores for feasibility & impact, and you have the same score, out of 30, that you normally would when prioritizing experiments.

Some design decisions might need to be changed to fit context. This is good for customers and good for your business. Nobody wants surprises where they’re browsing your store; those add cognitive overhead, which reduces conversion. The goal is to score higher on context fit, not to let poorly-fitting fixes slide down the prioritization scale.

Other Prioritization Frameworks

Over the years, various consultancies and businesses have publicized their own frameworks for prioritization. Let’s take a look at a few of the most popular ones, and determine what works and what could be improved in each.


PIE is WiderFunnel’s prioritization framework. The three letters in this particular TLA correspond to each of the metrics you’ll be scoring: potential, importance, and ease.

Of the four frameworks we’re analyzing here, PIE is the most similar to the one that we use at Draft, at least on its face. Where it differs is in how each of the values is calculated – and how it sorts each hypothesis. The potential value is mostly grounded in research, and the importance value mostly connects to past tests. This is great if you run lots of experiments, and less great if you don’t already have an established experimentation program.


ICE stands for “impact, cost, effort,” which are the three metrics that are used to evaluate new experiment ideas.

This seems like a great idea, but the actual calculation of each value is mostly left up to the reader. I’d only implement ICE if you have a clear sense of what goes into the calculation of each value. Make sure you write down your criteria to get your team on the same page before you start using this framework.


PXL is the framework that they use at ConversionXL. It weighs 10 different factors, and (usually) scores them a 1 or a 0 based on whether each factor is true or not.

PXL is by far the simplest of the frameworks we’re discussing here, and the answers are answered easily. You can tell if something is above the page fold or not, for example. But within it is a weird complication around research: you can have a relatively high-scoring idea that is unsupported by research. We don’t test anything at Draft if it fails to be supported by research.

Remember that you’re never done generating new test ideas, and refreshing the list with high-priority, high-impact tests is the best way to keep an optimization practice fresh!

← Back to the Blog Check Out Draft’s Store →