GitScrum / Docs
All Best Practices

A/B Testing Workflow | Experiment Tracking

A/B testing requires systematic experiment tracking: hypothesis, implementation, analysis, and decision. GitScrum organizes experiments with labels and documents results in NoteVault.

4 min read

How to manage A/B testing in development workflow?

Manage A/B testing by creating experiment tasks with hypothesis, success metrics, and sample size requirements. Track through stages: hypothesis → implementation → running → analysis → decision. Document results in NoteVault for institutional learning. Use experiment labels and require statistical significance before declaring winners.

Experiment labels

LabelPurpose
experimentA/B test task
exp:hypothesisHypothesis phase
exp:implementingBuilding variants
exp:runningExperiment live
exp:analyzingData analysis
exp:winnerWinning variant
exp:inconclusiveNo clear winner
exp:loserOriginal better

A/B test board columns

ColumnPurpose
HypothesisProposed experiments
DesignVariant design
ImplementationBuilding test
RunningLive experiment
AnalysisEvaluating results
DecisionWinner selected
RolloutImplementing winner

A/B test task template

## Experiment: [Hypothesis in One Line]

### Hypothesis
If we [change], then [metric] will [improve/decrease] because [reason].

### Metrics
- Primary: Conversion rate
- Secondary: Time on page, bounce rate
- Guardrail: Load time (must not regress)

### Variants
| Variant | Description |
|---------|-------------|
| Control (A) | Current design |
| Treatment (B) | New CTA button color |

### Sample Size
- Minimum: 1,000 users per variant
- Expected duration: 14 days
- Traffic split: 50/50

### Success Criteria
- Minimum detectable effect: 5%
- Statistical significance: 95%
- Winner if: Treatment >= Control + 5%

### Implementation
- [ ] Feature flag setup
- [ ] Variant A (control)
- [ ] Variant B (treatment)
- [ ] Analytics events
- [ ] QA both variants

### Analysis
Start date: [Date]
End date: [Date]
Sample achieved: [Number]

Results:
| Metric | Control | Treatment | Difference | Significant? |
|--------|---------|-----------|------------|--------------|
| Conv rate | 3.2% | 3.8% | +18.7% | Yes (p=0.02) |

### Decision
[Winner/Loser/Inconclusive] - [Rationale]

### Follow-up
- [ ] Roll out winner to 100%
- [ ] Clean up feature flags
- [ ] Document learnings

Experiment workflow:

  • Hypothesis - Define what you're testing and why
  • Design - Create variant designs
  • Implementation - Build with feature flags
  • QA - Test both variants
  • Launch - Start experiment
  • Monitor - Watch for issues
  • Wait - Let it run to sample size
  • Analyze - Statistical analysis
  • Decide - Pick winner or iterate
  • Rollout - Implement decision
  • NoteVault experiment log

    # Experiment Results Log
    
    ## 2025-Q1 Experiments
    
    ### Exp-042: Blue CTA Button
    - Hypothesis: Blue button increases clicks
    - Result: Winner (+18% conversion)
    - Decision: Rolled out
    - Learning: High-contrast CTAs perform better
    
    ### Exp-041: Simplified Checkout
    - Hypothesis: Fewer fields = more completions
    - Result: Inconclusive (4% lift, not significant)
    - Decision: Run longer with more traffic
    - Learning: Need larger sample for checkout tests
    
    ### Exp-040: Exit Intent Popup
    - Hypothesis: Popup saves abandoning users
    - Result: Loser (-5% satisfaction, +2% saves)
    - Decision: Not implemented
    - Learning: Popups hurt brand perception
    

    Common A/B test mistakes

    MistakePrevention
    Stopping earlyWait for sample size
    No hypothesisRequire before implementation
    Multiple changesTest one variable
    Ignoring guardrailsMonitor side effects
    No documentationRecord all results

    Related articles