A/B Testing Workflow | Experiment Tracking

A/B testing requires systematic experiment tracking: hypothesis, implementation, analysis, and decision. GitScrum organizes experiments with labels and documents results in NoteVault.

4 min read

How to manage A/B testing in development workflow?

Manage A/B testing by creating experiment tasks with hypothesis, success metrics, and sample size requirements. Track through stages: hypothesis → implementation → running → analysis → decision. Document results in NoteVault for institutional learning. Use experiment labels and require statistical significance before declaring winners.

Experiment labels

Label	Purpose
experiment	A/B test task
exp:hypothesis	Hypothesis phase
exp:implementing	Building variants
exp:running	Experiment live
exp:analyzing	Data analysis
exp:winner	Winning variant
exp:inconclusive	No clear winner
exp:loser	Original better

A/B test board columns

Column	Purpose
Hypothesis	Proposed experiments
Design	Variant design
Implementation	Building test
Running	Live experiment
Analysis	Evaluating results
Decision	Winner selected
Rollout	Implementing winner

A/B test task template

## Experiment: [Hypothesis in One Line]

### Hypothesis
If we [change], then [metric] will [improve/decrease] because [reason].

### Metrics
- Primary: Conversion rate
- Secondary: Time on page, bounce rate
- Guardrail: Load time (must not regress)

### Variants
| Variant | Description |
|---------|-------------|
| Control (A) | Current design |
| Treatment (B) | New CTA button color |

### Sample Size
- Minimum: 1,000 users per variant
- Expected duration: 14 days
- Traffic split: 50/50

### Success Criteria
- Minimum detectable effect: 5%
- Statistical significance: 95%
- Winner if: Treatment >= Control + 5%

### Implementation
- [ ] Feature flag setup
- [ ] Variant A (control)
- [ ] Variant B (treatment)
- [ ] Analytics events
- [ ] QA both variants

### Analysis
Start date: [Date]
End date: [Date]
Sample achieved: [Number]

Results:
| Metric | Control | Treatment | Difference | Significant? |
|--------|---------|-----------|------------|--------------|
| Conv rate | 3.2% | 3.8% | +18.7% | Yes (p=0.02) |

### Decision
[Winner/Loser/Inconclusive] - [Rationale]

### Follow-up
- [ ] Roll out winner to 100%
- [ ] Clean up feature flags
- [ ] Document learnings

Experiment workflow:

Hypothesis - Define what you're testing and why

Design - Create variant designs

Implementation - Build with feature flags

QA - Test both variants

Launch - Start experiment

Monitor - Watch for issues

Wait - Let it run to sample size

Analyze - Statistical analysis

Decide - Pick winner or iterate

Rollout - Implement decision

NoteVault experiment log

# Experiment Results Log

## 2025-Q1 Experiments

### Exp-042: Blue CTA Button
- Hypothesis: Blue button increases clicks
- Result: Winner (+18% conversion)
- Decision: Rolled out
- Learning: High-contrast CTAs perform better

### Exp-041: Simplified Checkout
- Hypothesis: Fewer fields = more completions
- Result: Inconclusive (4% lift, not significant)
- Decision: Run longer with more traffic
- Learning: Need larger sample for checkout tests

### Exp-040: Exit Intent Popup
- Hypothesis: Popup saves abandoning users
- Result: Loser (-5% satisfaction, +2% saves)
- Decision: Not implemented
- Learning: Popups hurt brand perception

Common A/B test mistakes

Mistake	Prevention
Stopping early	Wait for sample size
No hypothesis	Require before implementation
Multiple changes	Test one variable
Ignoring guardrails	Monitor side effects
No documentation	Record all results