TemplateCampaign & Growth

A/B Test Planning Template

Run structured A/B tests that produce actionable insights. Covers hypothesis formation, variable isolation, sample size calculation, and results documentation.

A/B Test Planning Template

A/B testing is one of the highest-leverage activities in content and growth marketing — it lets you learn what works for your specific audience rather than relying on industry benchmarks that may not apply to you. The problem is that most A/B tests are run poorly: too many variables changed at once, sample sizes too small to be significant, tests ended too early, and learnings documented nowhere.

This template gives you a rigorous process for planning, running, and extracting learning from A/B tests across your content and marketing channels.

Before You Run Any A/B Test

Ask three questions:

  1. Is this a real hypothesis or just a hunch? A test needs a specific hypothesis ("We believe X will outperform Y because Z") — not just "let's see which version people prefer."

  2. Do we have enough traffic to reach statistical significance? Running a test on a page that gets 50 visitors a month will take a year to produce statistically significant results. Low-traffic elements need prioritization by potential impact, not just curiosity.

  3. Is this the highest-leverage thing to test right now? A/B testing a button color when your conversion rate is 0.2% (probably a messaging problem) is a waste of effort. Fix the big things first, then optimize.


Part 1: Test Registry

Maintain a living list of all A/B tests your team has run or is planning. This prevents duplicate testing, surfaces learnings across teams, and builds your institutional knowledge base.

Test IDDate StartedPage/ElementHypothesisResultStatus
AT-001Winner: Control/VariantComplete
AT-002Running
AT-003Planned

Averi automates this entire workflow

From strategy to drafting to publishing — stop doing it manually.

Start Free →

Part 2: Individual Test Plan

Complete one of these for every A/B test before you start.


Test ID: _______________ | Date: _______________

Test Name: _______________________________________________ (A clear, searchable name: "Homepage Hero Headline Test — Q1 2026")

Test Owner: _______________________________________________

Test Channel / Page: _______________________________________________ (e.g., Homepage, Pricing page, Email subject lines, LinkedIn ad copy)


Step 1: Define the Problem

What specific problem or opportunity prompted this test?

Current performance (the baseline you're trying to improve):

  • Metric: _______________________________________________
  • Current value: _______________________________________________ (e.g., Homepage conversion rate: 2.1%, Email open rate: 28%, Ad CTR: 0.8%)

Why you believe there's room to improve:


What data / observation / customer feedback prompted this test:



Step 2: Write the Hypothesis

A proper testing hypothesis has three parts: the change you're making, the outcome you expect, and the reason you expect it.

Format: "If we [change], then [outcome] will [improve/change], because [reason]."

Your hypothesis:

If we _______________________________________________, then _______________________________________________ will _______________________________________________, because _______________________________________________.

Example hypothesis: "If we change the homepage headline from 'AI content for startups' to 'Publish a week's worth of content in two hours,' then the sign-up conversion rate will increase, because the new headline communicates a specific time-saving outcome rather than a feature description, which better matches what our ICP cares about."


Step 3: Define the Test Variable(s)

What are you changing? (limit to ONE variable per test)

  • Headline / H1
  • Subheadline / supporting copy
  • CTA button text
  • CTA button color / placement
  • Email subject line
  • Email from name
  • Email preview text
  • Hero image / video
  • Social proof element (testimonial, logo placement, review badge)
  • Price display / pricing structure
  • Form length / fields
  • Page layout / content order
  • Offer / incentive (e.g., free trial vs. demo CTA)
  • Other: _______________________________________________

Control (A) — what you currently have:


Variant (B) — what you're testing:


If testing multiple variants (A/B/C test): List variants C, D, etc.:

  • Variant C: _______________________________________________
  • Variant D: _______________________________________________

(Note: Multiple variants require proportionally larger sample sizes — factor this into your sample size calculation)


Step 4: Define Your Success Metric

Primary metric (the one number this test is designed to move):


(e.g., conversion rate, click-through rate, open rate, form completion rate)

Secondary metrics (to watch but not to judge the test by):




Guardrail metrics (metrics you're monitoring to ensure the variant doesn't harm something else):



(e.g., testing a more aggressive CTA — track that it doesn't increase bounce rate or decrease qualified lead rate)


Step 5: Calculate Required Sample Size

Statistical significance requires enough data to distinguish a real signal from random noise. Use a sample size calculator (VWO, Evan Miller's online calculator, or similar) to determine how long to run your test.

Inputs to the calculator:

  • Current baseline conversion rate: ___%
  • Minimum detectable effect (the smallest improvement you'd care about): ___% (Recommendation: 10-20% relative improvement for most tests. If your baseline is 2%, a 10% improvement means reaching 2.2%)
  • Statistical significance level: 95% (standard) / 90% (acceptable for lower-stakes tests)
  • Statistical power: 80% (standard)

Result from calculator:

  • Required sample size per variant: _______________ visitors/emails/impressions
  • Estimated time to reach this with current traffic: _______________ days

Minimum test duration (do not end earlier, even if results look promising): _______________ days

Maximum test duration (end here even if underpowered — then decide whether to prioritize more traffic): _______________ days

Rule of thumb: Run tests for a minimum of 2 weeks regardless of traffic levels. This accounts for day-of-week variation.


Step 6: Determine Traffic Split

Traffic allocation:

  • Control (A): ____%
  • Variant (B): ____%
  • Variant (C, if applicable): ____%

Standard split: 50/50 for two-variant tests. If you're testing a more radical change on a high-traffic page and want to limit exposure risk, a 90/10 or 80/20 split is acceptable — but note this will significantly increase time to statistical significance.


Step 7: Implementation Notes

Test tool / platform being used:


(e.g., Google Optimize, Optimizely, VWO, HubSpot A/B, Mailchimp A/B, native platform testing)

QA checklist before launch:

  • Control version displays correctly
  • Variant version displays correctly
  • Tracking/analytics events fire correctly for both versions
  • UTM parameters are configured
  • Test is not running simultaneously with other tests on the same page
  • Team is informed not to manually send traffic differentially to one version

Launch date: _______________________________________________

Scheduled end date (do not end early): _______________________________________________


Part 3: Monitoring During the Test

Check-in schedule: _______________________________________________ (Recommendation: Review data weekly, do not look at early results to reduce temptation to end prematurely)

Interim check-in notes:

DateSample Size (A)Sample Size (B)Metric (A)Metric (B)Notes

Red flags to watch for:

  • One version has technical issues (traffic drops to zero, unusually high bounce)
  • External event has affected traffic patterns (sale, PR mention, ad campaign)
  • Guardrail metric has declined significantly

Part 4: Results and Analysis

Complete after test ends.

Test end date: _______________________________________________

Final sample sizes:

  • Control (A): _______________ visitors/impressions
  • Variant (B): _______________ visitors/impressions

Final results:

  • Control (A) [metric]: ___%
  • Variant (B) [metric]: ___%
  • Relative difference: ___% improvement / decline
  • Statistical significance achieved: Yes / No
  • Confidence level: ___%
  • P-value (if applicable): ___

Winner: Control (A) / Variant (B) / No statistically significant difference

Decision: Implement variant / Keep control / Run follow-up test


Analysis and Interpretation

If the variant won:

Why do you believe it won? (Link back to your hypothesis)


Is the result consistent with your hypothesis or surprising?


What does this tell you about your audience's behavior or preferences?


If the control won (variant did not improve):

Why might the variant have underperformed your hypothesis?


What would you test next based on this result?


If the result was inconclusive (no statistical significance):

Was the effect size smaller than expected (real, but too small to detect with available traffic)?


Should you run a bigger test or prioritize a different test instead?



Secondary Metric Analysis

Secondary MetricControl (A)Variant (B)ChangeAnalysis

Did any guardrail metrics move in a concerning direction?



Build your content engine with Averi

AI-powered strategy, drafting, and publishing in one workflow.

Start Free →

Part 5: Documentation and Learnings

This is the section most teams skip — which is why they keep re-running the same tests and relearning the same lessons.

One-sentence summary of what this test taught us:


How this changes our understanding of our audience or product:


Implications for future tests (what should we test next?):




Implications for our content strategy or messaging:


Should this result influence other pages or channels? (i.e., if an email subject line framing won, should we test the same framing on landing page headlines?)



A/B Testing Priority Matrix

Use this matrix to prioritize which tests to run next. Score each potential test on two dimensions:

Impact (1-5): How much could this move our primary metric if we find a winner? Ease (1-5): How quickly and cheaply can we run this test?

Potential TestImpactEasePriority Score (Impact × Ease)

Run next: The highest-priority score item on your list.


Frequently Asked Questions

How do I know if my sample size is large enough?

Use a sample size calculator with your baseline conversion rate and minimum detectable effect. As a rough guide: for a page converting at 3% where you want to detect a 20% relative improvement (to 3.6%), you need approximately 6,000 visitors per variant. For an email open rate test at 30% with a 10% improvement target (~3 percentage points), you need approximately 2,000 emails per variant.

What's the most common A/B testing mistake?

Ending tests early when a variant looks like it's winning. Early data is noisy. A variant that looks like it's winning at Day 3 with 200 visitors per variant often reverts to parity or loss by Day 14. The only valid stopping point is when you've reached your pre-specified sample size, not when you've seen enough data to feel comfortable.

Can I run multiple A/B tests on the same page simultaneously?

Only if you can guarantee that different tests affect different users and don't interact. In practice, most teams avoid simultaneous page tests. Email A/B tests (subject lines, content variants) can run simultaneously since each subscriber only gets one email per campaign.

What statistical significance threshold should I use?

95% is the standard for most marketing tests (meaning 5% chance the result is a false positive). For lower-stakes decisions (which social post format to use), 90% may be acceptable. For high-stakes decisions (pricing changes, major homepage redesigns), consider requiring 97-99% significance.

How do I handle an A/B test where the variant wins on the primary metric but loses on a secondary metric?

Dig into the secondary metric. If the variant increased sign-ups but decreased trial-to-paid conversion, you may have found a message that attracts a lower-quality audience. The primary metric "won" but the business outcome may be worse. When primary and secondary metrics diverge, investigate before implementing the winner.

Start Your AI Content Engine

Ready to put this into practice? Averi automates the hard parts of content marketing — so you can focus on strategy.

Related Resources