Why imperfect testing can be so valuable

Will Uppington

10 years ago

There is a reason that the mantra “test everything” is so popular.

But when marketers talk about testing, what they are really getting at is a call to be data-driven. And In fact, testing is the bedrock of running your enterprise in a data-driven way.

Few would argue against the idea that building a data-driven culture is a key to success. But the conversation doesn’t stop there. The real trick is to create a practical, data-driven mindset that is also efficient. How can you make most of your decisions data-driven, rather than just a few. It requires a mindset that doesn’t let perfect become the enemy of the good.

Meaning what, you ask? The reality is that testing everything is impractical. And while a “perfect” A/B test, one that measures the impact of a change on revenue, is the gold standard, there are circumstances that call for a different testing approach; circumstances I’ll get to in a minute.

Every time you make a change to a website, you should think about measuring the effects of that change. The first step in this thought process should be to consider how you can reach a good, data-driven conclusion as quickly and as cost-effectively as possible. What level of testing is required, given the potential value and uncertainty of the change? What knowledge will you gain with a test and how much are you willing to spend in time, resources, money and opportunities lost to get that information? What do you do when the cost of an ideal test is simply too high for the anticipated return?

That’s when it’s time to think of another test; a test that won’t yield the perfect answer, but will still give you a very good answer.

Admittedly, when it comes to designing the proper test, determining what rates as “good enough” can be difficult. The idea here is to keep in mind that testing is not all or nothing. It’s not a choice between running a bullet-proof test that will analyze the exact metric you want — or running no test at all.

The exercise can be a lot easier if you have a framework that aligns the test you’re conducting with the move you’re making and the result you’re hoping to affect.

Let’s put that notion to work on four common scenarios and consider the sort of test that each would call for:

The perfect A/B test: The gold standard test yields the metric that everyone wants to see, namely revenue or conversions, in whatever form this manifests itself for your business, e.g. orders, sign-ups, etc. It comes with a high resource, time and opportunity cost. You need a lot of traffic to get a good, statistically significant answer on revenue when sites have low, single-digit conversion rates. By definition, during the course of the test, only a portion of your site will enjoy the benefit of whatever change you’ve made. The control group will not share in any improvement. So, when to use it? Say you’ve made a big change, a completely new page layout, for example. That’s a pretty big deal and you aren’t sure whether it is going to be better than the last layout or not. Given that this layout affects a large number of pages with a lot of traffic, you will see enough traffic to get to significance within a relatively short period of time.
Modified A/B test: What if you’re contemplating changing the navigation on a single category page to highlight products that you think customers would like more than the products that you are currently highlighting. But collecting sufficient revenue data would mean running a test for months, which would mean that you won’t get a read on this change before you need to change the navigation again because of a seasonal shift in demand. Absurd, right? What’s an alternative that, admittedly won’t give you a perfect read, but will be a good alternative? One approach is to look at higher funnel metrics, like bounce rate or click-through rates. Basing your test on these metrics will give you sufficient data to come to a conclusion in a shorter time period. If bounce rates and click-through rates are reasonably correlated with revenue, then you can feel good that the change is one for the better — and you can potentially get this data in a couple of weeks versus months. In following this path, you should, of course, still look at the revenue or conversion data. If the positive or negative impact of your change is large enough, you might still be able to get a statistically significant read with lower funnel metrics. Significance is a function of the size of the change as well as the number of tests. But if the impact isn’t big enough, then using higher funnel metrics can be better than not having any data or even misleading, lower-funnel data that is not statistically significant.
A test when you know that you don’t have sufficient data to run an A/B test: Consider a case in which you’re changing the imagery on pages that get decent traffic, but not enough to run an A/B test — even with a traffic metric — in any reasonable time frame. Or what happens if you have other A/B tests already running and you just can’t run that many tests all at the same time. If you have used data to come up with your idea for change in the first place and it indicates the change isn’t at all likely to be negative, then the best available alternative is a before-and-after analysis, with a healthy dose of applied judgment. Before-and-after analyses have a dubious reputation. Without applied judgment, they can be very misleading, mostly because of externalities. What happens if you change the imagery and right after you do, the products on that page go on sale? Any externality, or a change outside of the experiment, that affects the same metrics you are trying to analyze, can make it hard to draw any conclusions from a before-and-after test. However, if there are not a lot of externalities or you can apply judgement to modify the analysis to take the externalities into account, then a before-and-after analysis is better than the alternative, i.e. no analysis.
A change is right but the cost of measuring it isn’t worth it: Your data-driven analysis, experience, expertise and/or common sense tell you a certain change is the right one to make. But you also know that measuring the change isn’t worth the cost, given the potential value in making it. For instance, the holiday shopping season is upon you and you have what you think is a better hero image for a page that doesn’t get a lot of traffic. This is the time to do what you think is right. Don’t let the lack of measurement prevent you from doing something that is good for your site.

Your go-to temptation is to test and to test rigorously. It’s a good instinct; the right one, in fact. But be sure to temper that instinct with a nod to keeping a practical data-driven mindset and the idea of not letting the perfect be the enemy of the good.

BloomReach provides online marketing and personalization technology.

Favorite