Start With What Actually Matters: Picking the Right Thing to Test
A/B testing your website means showing two versions of a page element to different visitors, measuring which version produces more of the outcome you care about, and then keeping the winner. That is the entire concept. The hard part is not the mechanics; it is knowing what to test, when to test it, and how to read the results without fooling yourself.
Most teams get this backwards. They install a testing tool, change a button colour, run the test for four days, see a 0.3% difference, and declare victory. That is not A/B testing. That is theatre. Real testing requires a hypothesis, adequate traffic, statistical discipline, and a willingness to act on what you learn. This article walks through the entire process the way we approach it when running conversion programmes for mid-market B2B companies.
Why Most A/B Tests Fail Before They Start
The number one reason A/B tests produce meaningless results is that teams test trivial changes on pages that do not matter. Swapping “Submit” for “Get Started” on a form that only gets 40 visits a month will never reach statistical significance. You will wait weeks, get inconclusive data, and conclude that testing “doesn’t work for us.”
The second most common failure is testing without a hypothesis. A hypothesis is not “I wonder if green converts better than blue.” A hypothesis sounds like this: “We believe that replacing the generic hero headline with a specific outcome statement will increase demo requests by at least 15%, because our heatmap data shows 68% of visitors leave the page without scrolling past the hero section.” That sentence contains a belief, a measurable prediction, and evidence that prompted the test. Without all three, you are guessing.
In our conversion audits, the most common issue we find is not a lack of testing ideas. It is a lack of prioritisation. Teams have dozens of things they could test, but they spend their limited traffic budget on low-impact experiments because nobody has mapped out where the biggest leaks actually are.
Before You Test: Audit Where Visitors Drop Off
You cannot test your way to a better website if you do not know where visitors are abandoning. Before opening any testing tool, spend a week collecting behavioural data that tells you where the real problems live.
Analytics: Find Your Leaky Pages
Open Google Analytics (or whatever analytics platform you use) and look at your top landing pages by traffic volume. For each one, check the bounce rate and the exit rate. Pages with high traffic and high bounce rates are your prime testing candidates, because even a small percentage improvement translates into meaningful numbers when thousands of visitors are involved.
Next, look at your conversion funnel. If you have a multi-step process (say, landing page to pricing page to contact form to thank-you page), identify where the biggest percentage drop-off happens. That drop-off point is almost always where you should focus your first test. Fixing a 60% abandonment rate between your pricing page and contact form will do far more than tweaking your homepage headline.
Heatmaps and Session Recordings
Quantitative data tells you where visitors leave. Qualitative data tells you why. Install a tool like Hotjar or Microsoft Clarity and collect at least 200 session recordings on your most important pages. Watch how people actually use the page. Do they scroll past your value proposition without pausing? Do they hover over a pricing element and then leave? Do they click on something that is not actually a link?
Heatmaps show you aggregate scroll depth and click patterns. If 70% of your visitors never scroll below the fold on a page where your primary call to action sits at the bottom, you do not need a test to know the CTA needs to move. Some fixes are obvious enough that testing them wastes time. Save your tests for changes where the outcome is genuinely uncertain.

Choosing a Testing Tool
For most mid-market companies, the right A/B testing tool is the simplest one that integrates with your existing analytics. Here are the options worth considering:
- Google Optimize’s successor, Google’s built-in experiments: Free, integrates natively with GA4, limited in features but adequate for straightforward page-level tests.
- VWO (Visual Website Optimizer): Strong visual editor, good for teams without developer support, solid statistical engine. Plans start around $200/month.
- Optimizely: Enterprise-grade, powerful for complex multi-page experiments, but priced accordingly. Overkill for most teams under 100 employees.
- Convert: Mid-range price, privacy-focused, reliable. A good choice for B2B companies that need GDPR compliance without enterprise costs.
The tool matters far less than the process. Do not spend three months evaluating platforms. Pick one that your team can actually use, install the snippet, verify it loads correctly, and move on to the work that matters: forming hypotheses and running experiments.
How to Structure a Test Properly
Every A/B test has five components. Miss any one of them and your results are unreliable.
1. A Specific, Measurable Hypothesis
We covered this above, but it bears repeating because it is the foundation. Your hypothesis should state what you are changing, what metric you expect to move, by roughly how much, and why you believe this will happen. Write it down before you touch the testing tool. Share it with your team. If you cannot articulate why you expect a change to work, you are not ready to test it.
2. A Single Variable
Change one thing at a time. If you change the headline, the hero image, and the CTA button simultaneously, and the variant wins, you have no idea which change drove the improvement. You might keep a worse headline paired with a better image, never knowing which element actually mattered.
The exception is full-page redesign tests, sometimes called “radical redesign” tests, where you test an entirely new page layout against the existing one. These are useful when you have reason to believe the entire page structure is broken, but they trade diagnostic precision for speed. You learn whether the new approach works, but not which specific element made the difference. We typically recommend starting with single-variable tests and reserving radical redesigns for situations where analytics data suggests the whole page framework is failing.
3. Adequate Sample Size
This is where most teams get into trouble. You need to determine your required sample size before you start the test, not after. Use a sample size calculator (there are free ones from Evan Miller, VWO, and others). You will need to input three numbers: your current conversion rate on the page, the minimum detectable effect (the smallest improvement you care about), and your desired statistical significance level (typically 95%).
For example, if your current landing page converts at 3% and you want to detect a 20% relative improvement (meaning a lift to 3.6%), you will need roughly 15,000 visitors per variation at 95% significance. That means 30,000 total visitors to complete the test. If your page gets 500 visitors a week, this test will take 60 weeks. That is not practical. You either need to test a page with more traffic, aim for a larger effect size, or accept a lower significance threshold.
This maths is non-negotiable. Running a test without calculating sample size in advance is the single most common mistake in website optimisation, and it produces false positives that lead teams to implement changes that actually hurt performance.
4. Sufficient Duration
Even if you hit your sample size in three days, run the test for at least two full business weeks. Visitor behaviour varies by day of week. B2B sites often see dramatically different traffic patterns on Tuesday versus Saturday. If your test only captures weekday data, your results will not hold up when weekend traffic is included. Similarly, end-of-month behaviour can differ from mid-month behaviour in industries where purchasing cycles follow calendar patterns.
5. A Single Primary Metric
Decide in advance which metric defines success. Is it form submissions? Demo requests? Click-through rate to the pricing page? Revenue per visitor? Pick one. You can track secondary metrics for learning purposes, but your go/no-go decision should rest on a single primary metric. If you evaluate five metrics and declare victory when any one of them improves, you have a 23% chance of seeing a false positive even at 95% significance. Statisticians call this the multiple comparisons problem. Practitioners call it “finding patterns in noise.”
What to Test: A Priority Framework
Not all tests are created equal. We use a prioritisation framework based on three factors: potential impact, confidence in the hypothesis, and ease of implementation. Score each test idea from 1 to 10 on each factor, multiply the scores together, and rank by total. This is a variant of the ICE framework adapted for conversion work.
In practice, the highest-priority tests almost always fall into these categories, roughly in order of typical impact:
Value Proposition and Headlines
Your headline is the first thing visitors process. If it does not immediately communicate what you do, who you do it for, and why they should care, nothing else on the page matters. We regularly see headline tests produce 20-40% lifts in engagement metrics because the original headline was either too vague (“Innovative Solutions for Modern Businesses”) or too focused on features rather than outcomes (“AI-Powered Analytics Platform” versus “See Which Deals Will Close This Quarter”).
Call-to-Action Placement, Copy, and Context
CTA tests go beyond button colour. The most impactful CTA tests we run involve what surrounds the button, not the button itself. Adding a single line of supporting text beneath a “Request a Demo” button that says “30-minute call, no commitment, we’ll show you your data” can double click-through rates because it reduces uncertainty. The visitor knows exactly what will happen when they click.
Social Proof and Trust Signals
Where you place testimonials, case study snippets, client logos, and trust badges has an outsized effect on conversion. Our team recommends testing the placement and specificity of proof elements rather than their mere presence. A testimonial that says “Great product, highly recommend” does almost nothing. A testimonial that says “We reduced our sales cycle from 47 days to 29 days within three months” is a conversion asset. Test specific, outcome-focused proof against generic praise. The specific version wins nearly every time.
Form Length and Field Composition
The conventional wisdom that shorter forms convert better is true in aggregate but misleading in practice. For B2B companies, removing fields can increase submissions while decreasing lead quality. What you actually want to optimise is qualified submissions, not total submissions. Test adding a qualifying question (like company size or use case) and measure downstream metrics: did more of those leads become real conversations? Sometimes a longer form that filters out poor-fit enquiries produces better business results despite a lower raw conversion rate.
Page Structure and Information Hierarchy
This is where testing intersects with what we describe in our conversion systems guide: the sequence in which a visitor encounters information, proof, and friction points determines whether they convert. Testing the order of page sections (for example, moving a “How It Works” section above the pricing overview versus below it) often reveals that visitors need certain questions answered before they are willing to engage with commercial information. These structural tests can be more impactful than any copy change.

Reading Results Without Fooling Yourself
When your test reaches the required sample size and has run for at least two weeks, it is time to analyse. Your testing tool will show you a conversion rate for each variation, a percentage difference, and a statistical significance level.
Only act on results that reach 95% significance or higher. If your tool shows a 12% improvement at 87% significance, you do not have a winner. You have noise that looks like a signal. Resist the temptation to call it early. We have seen teams implement “winners” at 85% significance only to watch conversion rates drop below the original baseline within a month.
Also watch for segmentation effects. A variant might win overall but lose badly among your most valuable audience segment. If you sell to both small businesses and mid-market companies, and the test winner appeals to small businesses while repelling mid-market buyers, implementing it could hurt revenue even though the top-line conversion rate improved. Always segment your results by the dimensions that matter to your business: device type, traffic source, company size if you can track it, and new versus returning visitors.
What to Do With Losing Tests
A test where neither variant wins is not a failure. It is information. It tells you that the element you changed is not the bottleneck for that page. This is valuable because it redirects your attention to the elements that do matter. Document every test, including losers, with the hypothesis, the result, and what you learned. Over time, this testing log becomes your most valuable conversion asset because it encodes institutional knowledge about what your specific audience responds to.
Common Mistakes That Waste Months of Effort
Beyond the statistical pitfalls already covered, here are the practical mistakes we see most often when working with teams who are new to testing:
Testing on low-traffic pages. If a page gets fewer than 1,000 visitors per month, you almost certainly cannot run a meaningful test on it. Focus your testing programme on your highest-traffic pages and use best-practice principles (informed by tests on other pages) to improve low-traffic pages without formal experiments.
Stopping tests early when results look good. Statistical significance fluctuates wildly in the early days of a test. It is common to see 99% significance after 200 visitors that drops to 60% after 2,000. The early result was a mirage. Always wait for the pre-calculated sample size.
Running too many tests simultaneously. If you run tests on your homepage, pricing page, and contact form at the same time, and a visitor sees all three variations, you cannot isolate which change affected their behaviour. On high-traffic sites with distinct user journeys, parallel tests are manageable. On most mid-market sites, run one test at a time or test on pages that share minimal overlap in visitor paths.
Ignoring page load impact. Some testing tools inject JavaScript that adds 200-500ms of page load time. On mobile connections, this can be worse. If your testing tool slows down your site noticeably, it may suppress conversion rates for both variations, making your baseline look worse than it actually is. Monitor your page speed during tests and ensure the tool is not introducing its own conversion penalty.
Building a Sustainable Testing Programme
One-off tests produce one-off insights. The real value of A/B testing comes from building a continuous testing rhythm where each test informs the next. A strong programme looks like this:
Each month, review your analytics and behavioural data. Identify the biggest conversion leak. Form a hypothesis about what is causing it. Design a test. Run the test until it reaches significance. Implement the winner (or document the learning from a losing test). Move on to the next leak.
Over twelve months, a team running one test per month on their highest-traffic pages will typically produce three to five statistically significant winners that compound into meaningful conversion improvements. A site that converts at 2% can realistically reach 3-4% within a year through disciplined, sequential testing. That does not sound dramatic, but for a B2B company generating 10,000 monthly visits, moving from 2% to 3.5% means an additional 150 leads per month from the same traffic. At typical B2B deal values, that change funds itself many times over.
The teams that succeed with testing share a common trait: they treat it as an ongoing operational discipline, not a one-time project. They allocate time each month, they document results systematically, and they let the data override opinions. When the CEO’s preferred headline loses to a plainer, more specific alternative, they implement the winner anyway. That willingness to follow evidence over intuition is what separates companies that genuinely improve their websites from those that just talk about it.
Getting Started This Week
If you have never run an A/B test before, here is a practical starting point. Install a heatmap tool on your top three landing pages and collect two weeks of data. Identify the page with the highest traffic and the most obvious behavioural problem (visitors leaving before scrolling, clicking elements that are not links, ignoring your CTA). Form a hypothesis about one change that could address that problem. Use a sample size calculator to confirm the test is feasible given your traffic levels. Set up the test, commit to running it for the full required duration, and read the results honestly when it finishes.
That single test will teach you more about your website, your visitors, and the testing process than any amount of reading. The second test will be faster. The third will be sharper. Within a quarter, you will have a working system for continuous improvement that makes every other marketing investment more effective, because the website those investments drive traffic to actually converts.


