A/B Testing for Support
A/B testing removes guesswork from optimization. Instead of assuming what works, you measure it.
What to Test
AI Response Style
- Formal vs. conversational tone
- Short vs. detailed responses
- With vs. without emojis
- Bullet points vs. paragraphs
Escalation Thresholds
- 0.85 confidence vs. 0.70 confidence
- Keyword triggers vs. sentiment triggers
- Immediate escalation vs. "Let me try one more thing"
Proactive Triggers
- Different messages for the same trigger
- Trigger timing (5 seconds vs. 15 seconds delay)
- Different customer segments
Knowledge Base Content
- Different article formats for the same topic
- Q&A format vs. step-by-step guide
- With vs. without screenshots
Setting Up an A/B Test
-
Define your hypothesis. Example: "Shorter AI responses will increase CSAT by 10%."
-
Choose your metric. Pick one primary metric to measure (CSAT, resolution rate, escalation rate).
-
Calculate sample size. You need enough conversations for statistical significance. Rule of thumb: at least 200 conversations per variant.
-
Configure the test. In Corebee, go to Settings > Experiments:
{
"name": "Response Length Test",
"variants": [
{ "name": "control", "config": { "max_response_length": 200 }, "weight": 50 },
{ "name": "short", "config": { "max_response_length": 100 }, "weight": 50 }
],
"metric": "csat_score",
"minimum_sample": 200,
"duration_days": 14
}
-
Run and wait. Do not peek at results early or stop the test prematurely.
-
Analyze results. Check statistical significance before drawing conclusions.
Reading Results
A result is statistically significant when:
- The p-value is below 0.05 (95% confidence)
- The sample size meets your minimum
- The test ran for the full planned duration
Common Pitfalls
- Testing too many things at once — Change one variable per test
- Stopping early — Exciting early results often regress to the mean
- Ignoring segments — A variant might help one customer type and hurt another
- No documentation — Record every test and result for future reference
Building a Testing Culture
- Run one test at a time to avoid interference
- Maintain a backlog of test ideas
- Share results with the team monthly
- Apply learnings immediately
Next up: Calculating the actual ROI of your AI support investment.