Generate A/B Test Ideas for Your Funnel

You'll end up with: A prioritized backlog of funnel A/B test ideas—each with a clear hypothesis, primary metric, stage, effort tags, and a one-line implementation note—ready to run or hand to a designer or developer.

Overview
20–30 min
Intermediate
Free
2 tools
Cost breakdown
ClaudeFree
Google SheetsFree
TotalFree
Common mistake

Brainstorming clever copy variants without tying each test to one stage metric or enough volume to learn—so nothing ships or you run everything at once. Fix: in Step 1 lock one north-star metric for this session; in Step 4 kill or downgrade tests that need traffic you don't have; in Step 5 rank to five and only fully spec the top three.

Before you start
  • Your funnel stages from first touch to conversion (bullets are OK)
  • One primary conversion you care about this quarter (e.g. booked calls, checkout, trial signups)
  • Approx traffic or list size at the weakest stage (order of magnitude is fine)
  • One place you already see numbers (analytics export, screenshot, or honest gut like "~70% bounce on pricing")
  • Claude and Google Sheets open in two tabs
1

Lock the funnel, stages, and one north-star metric

Mirror your funnel and pick one primary metric before any test ideas—stops random headline brainstorms.

ClaudeFreeOpen Claude
Exact action

1. Open https://claude.ai and start a new chat. Keep this single thread open through Step 5. 2. Paste and fill every bracket: I am prepping a batch of funnel A/B tests. Do NOT propose test ideas yet. Product / offer (one sentence): [...] Who we sell to (one sentence): [...] Funnel stages in order from first touch to conversion (bullets are fine): [...] North-star metric for THIS batch (pick exactly one—e.g. booked calls, trial signups, checkout completion, qualified leads): [...] Planning horizon (e.g. next 30 days): [...] Reply with ONLY: (a) A markdown table: Stage | Job_of_stage | Typical_drop_off_guess (b) Up to 5 missing numbers or facts you still need before ideation (c) One sentence restating the single north-star metric for this batch Rules: No hypotheses. No copy or layout ideas. No tool recommendations. 3. Read Claude's reply. If it lists tests anyway, send: "Stop. No test ideas in this thread yet—regenerate (a)-(c) only." If a stage is missing, answer briefly, then: "Regenerate the table only."

You have a short table naming every funnel stage and one explicit north-star metric for this batch. Claude has not proposed any tests yet.
Claude jumped straight to headlines, variants, or "try testing X"—reply with the stop line in step 3 and paste the prompt again with "No test ideas" in the first line.
2

Inventory friction (symptoms, not solutions)

List observable frictions per stage—evidence vs assumption—so later hypotheses map to real symptoms.

ClaudeFreeOpen Claude
Exact action

1. In the same Claude chat as Step 1, paste: Using ONLY the funnel table and north-star from above, for each stage list 3–6 friction bullets. Each bullet must be an observable symptom (confusion, anxiety, mismatch, speed, trust, proof gap, pricing clarity)—not a solution or test yet. Tag every bullet either Evidence or Assumption. Evidence must cite one of: analytics number, support ticket theme, sales-call pattern (last 3 calls), refund/churn reason, or email reply pattern. If you cannot find at least one Evidence bullet for a stage, write NEEDS_DATA and name the smallest metric to collect. Do not propose tests. 2. Add your own Evidence tags where you have them—do not leave every line as Assumption.

Each stage has at least three friction bullets; each bullet is tagged Evidence or Assumption; every stage has at least one Evidence bullet or an explicit NEEDS_DATA line.
Bullets are generic ("bad UX", "low conversion")—reply: "Rewrite every bullet as an observable user behavior or quote-style symptom; no generic labels."
3

Convert frictions into a burst of test hypotheses

Generate 15–25 atomic hypotheses in If / Then / Because form, typed as copy, layout, offer, proof, speed, or pricing presentation.

ClaudeFreeOpen Claude
Exact action

1. In the same chat, paste the funnel table from Step 1 and the friction lists from Step 2. 2. Ask Claude: Generate at least 15 and at most 25 A/B test hypotheses. Rules: - One atomic change per row (no bundles like "rewrite page and change offer"). - Use this exact sentence shape: If we change [element] for [segment], we expect [metric] to [up/down] because [one-line mechanism]. - Type must be exactly one of: Copy | Layout | Offer | Proof | Speed | Pricing presentation - Primary metric must match a stage metric (CTR, bounce, form start, form complete, reply rate, booking rate, checkout, AOV, etc.)—not vanity unless tied to the stage job. Output a markdown table with columns: Stage | Hypothesis | Type | Primary_metric | Mechanism_one_line 3. Scan for duplicate mechanisms; if two rows differ only in adjectives, merge or delete duplicates and ask Claude to output the cleaned table only.

You have a markdown table with at least 15 rows; each row is one atomic change; Types are only from the allowed list.
Rows bundle multiple changes—reply: "Split into atomic tests—one change per row. Re-output the full table."
4

Apply traffic, ethics, and sequencing guardrails

Label each hypothesis Feasible, Stretch, or Do not run yet using your real volumes and brand no-gos.

ClaudeFreeOpen Claude
Exact action

1. In the same chat, paste approximate weekly volumes (pick what you have—site visitors, landing page uniques, outbound emails sent, replies, trials, checkouts): Weekly visitors or emails by stage: - Stage 1: [...] - Stage 2: [...] (add rows until every stage is covered) Ethical / brand no-gos I will not run (list, e.g. hidden fees, fake scarcity, dark patterns, misleading claims): [...] 2. Ask Claude to take the full hypothesis table from Step 3 and add a column: Feasibility = Feasible | Stretch | Do not run yet Rules: - One sentence per row explaining the label. - Assume conservative baseline conversion rates when unsure. - Mark Do not run yet if learning would likely take more than ~4 weeks at stated volume OR the test conflicts with a no-go. - If two tests would compete for the same audience at the same time, note "sequence after [other test]" in that sentence. 3. Require at least three rows labeled Do not run yet unless volume is huge—if Claude marks everything Feasible, reply: "Assume conservative conversion rates; downgrade anything needing more than four weeks at this volume. Re-output the full table with Feasibility column."

Every hypothesis row has Feasible, Stretch, or Do not run yet plus a one-line reason; at least three rows are Do not run yet unless your pasted volumes are very high—in that case Claude states why fewer downgrades apply.
Everything is Feasible—send the downgrade instruction from step 3 and insist on honest time-to-learn.
5

Prioritize to the top 5 (ICE)

Score Impact, Confidence, and Ease 1–5 each; sort and keep five ranked tests with a one-line why now.

ClaudeFreeOpen Claude
Exact action

1. In the same chat, paste: Using only rows labeled Feasible or Stretch from the latest table, score each row: - Impact (1-5): expected effect on the north-star for this batch - Confidence (1-5): evidence strength behind the mechanism - Ease (1-5): implementation speed for you (S/M/L mapped to numbers is fine) ICE_total = Impact + Confidence + Ease (max 15). 2. Sort descending by ICE_total. Output exactly the top 5 rows with columns: Rank (1-5) | Stage | Hypothesis | Type | Primary_metric | ICE_total | Why_now (one non-generic sentence) 3. If ties clog the ranking, reply: "Break ties by proximity to money for the north-star; re-output top 5 only." 4. Sanity-check: disagree out loud with at least one rank—if you cannot, ask Claude which assumption would most change the ranking if wrong.

You have exactly five ranked tests with ICE totals and distinct why-now lines; rank 1 has the highest ICE unless you consciously swapped after the tie-break rule.
Scores are all identical—use the tie-break prompt in step 3; if still flat, ask Claude to vary Confidence by evidence strength row by row.
6

Build the Sheet backlog and fully spec the top 3

Create a reusable Google Sheet: five rows from the shortlist; ranks 1–3 get control, variant, run duration heuristic, and instrumentation.

Google SheetsFreeOpen Google Sheets
Exact action

1. Open https://sheets.google.com and create a new spreadsheet named: Funnel AB backlog — [YOUR BUSINESS] — [DATE] 2. Row 1 headers (exact text): Rank | Stage | Hypothesis | Type | Primary metric | ICE total | Effort S/M/L | Run notes | Falsify / watch-outs | Status 3. Paste the top 5 table from Step 5 into rows 2–6 under those columns (fill Effort S/M/L and Status yourself; Status can be Backlog). 4. For ranks 1–3 only, add four new columns after Status (insert columns or place to the right): Control | Variant | Minimum run rule | Instrumentation Fill Control and Variant in plain English (what stays vs what changes). Minimum run rule: use this heuristic unless you have a power calculator: "Two full business weeks OR 100 conversions on this step metric, whichever is later." Adjust the number only if Claude's volume notes justify it—write the final rule in the cell. Instrumentation: name the exact report, event, or sheet column you will read (e.g. GA4 landing page conversion, ESP click map, checkout step funnel). 5. Optional: freeze row 1 and turn Status into a data validation list: Planned | Running | Done | Killed.

Five data rows exist; ranks 1–3 have Control, Variant, Minimum run rule, and Instrumentation filled—no cell that only says run until significant.
Minimum run rule is vague—replace with the two-weeks-or-100-conversions heuristic (or your adjusted number with one sentence why).

All done!

You now have: A prioritized backlog of funnel A/B test ideas—each with a clear hypothesis, primary metric, stage, effort tags, and a one-line implementation note—ready to run or hand to a designer or developer.

Explore more guides

Want this workflow built for your business?

Book a free audit