9 min read • Guide 136 of 877
Setting Up On-Call Rotations for Development Teams
On-call responsibilities often fall unevenly on senior developers, leading to burnout and resentment. A well-structured rotation distributes the burden fairly, provides clear escalation paths, and compensates on-call time appropriately—so teams can respond to incidents without sacrificing personal time or sprint commitments.
The On-Call Problem
Why Rotations Fail
ON-CALL DYSFUNCTION:
┌─────────────────────────────────────────────────────────────┐
│ COMMON ON-CALL FAILURES │
├─────────────────────────────────────────────────────────────┤
│ │
│ THE "HERO" PATTERN: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ "Sarah knows the system best, she should handle it" ││
│ │ ││
│ │ Result after 6 months: ││
│ │ • Sarah is exhausted ││
│ │ • Nobody else learned the system ││
│ │ • Sarah leaves the company ││
│ │ • Team panics ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ THE "EVERYONE ALL THE TIME" PATTERN: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ "We all share on-call, so we're all always on-call" ││
│ │ ││
│ │ Result: ││
│ │ • Nobody feels truly off ││
│ │ • Alerts get ignored (someone else will get it) ││
│ │ • Confusion during incidents ││
│ │ • No accountability ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ THE "INVISIBLE BURDEN" PATTERN: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ On-call happens but isn't tracked or compensated ││
│ │ ││
│ │ Result: ││
│ │ • Developers resent being "always available" ││
│ │ • After-hours work isn't recognized ││
│ │ • Sprint commitments suffer ││
│ │ • Work-life balance erodes ││
│ └─────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────┘
Rotation Structure
Designing Fair Rotations
ROTATION DESIGN:
┌─────────────────────────────────────────────────────────────┐
│ CREATING EQUITABLE ON-CALL │
├─────────────────────────────────────────────────────────────┤
│ │
│ ROTATION PATTERNS: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Option A: Weekly rotation ││
│ │ ││
│ │ Week 1: Developer A (primary), Developer B (backup) ││
│ │ Week 2: Developer B (primary), Developer C (backup) ││
│ │ Week 3: Developer C (primary), Developer D (backup) ││
│ │ Week 4: Developer D (primary), Developer A (backup) ││
│ │ ││
│ │ Pros: Predictable, full context for week ││
│ │ Cons: Long stretches of being "on" ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Option B: Weekday/Weekend split ││
│ │ ││
│ │ Week 1 Weekdays: Developer A ││
│ │ Week 1 Weekend: Developer B ││
│ │ Week 2 Weekdays: Developer B ││
│ │ Week 2 Weekend: Developer C ││
│ │ ││
│ │ Pros: Shorter on-call periods ││
│ │ Cons: More handoffs, context loss ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Option C: Follow-the-sun (distributed teams) ││
│ │ ││
│ │ US business hours: US team ││
│ │ EU business hours: EU team ││
│ │ APAC business hours: APAC team ││
│ │ ││
│ │ Pros: No after-hours for anyone ││
│ │ Cons: Requires global team, complex handoffs ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ MINIMUM TEAM SIZE FOR SUSTAINABLE ON-CALL: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ 4+ developers: Weekly rotation with backup ││
│ │ 6+ developers: Weekday/weekend split feasible ││
│ │ 8+ developers: Multiple on-call tiers possible ││
│ │ ││
│ │ Below 4: Consider shared on-call with another team ││
│ │ or managed services for critical monitoring ││
│ └─────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────┘
GitScrum Implementation
Tracking On-Call in Your Workflow
ON-CALL TRACKING:
┌─────────────────────────────────────────────────────────────┐
│ MAKING ON-CALL VISIBLE │
├─────────────────────────────────────────────────────────────┤
│ │
│ ROTATION SCHEDULE (NoteVault): │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ # Q1 2025 On-Call Schedule ││
│ │ ││
│ │ | Week | Primary | Backup | Notes | ││
│ │ |----------|---------|---------|--------------------│ ││
│ │ | Jan 6-12 | Sarah | Mike | │ ││
│ │ | Jan 13-19| Mike | Alex | │ ││
│ │ | Jan 20-26| Alex | Emma | │ ││
│ │ | Jan 27-2 | Emma | Sarah | Emma PTO Feb 1 │ ││
│ │ | Feb 3-9 | Sarah | Mike | │ ││
│ │ ││
│ │ ## Swap requests ││
│ │ - [x] Emma ↔ Sarah for Jan 27 (approved) ││
│ │ ││
│ │ ## Coverage gaps ││
│ │ - None currently ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ INCIDENT TASK TRACKING: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ When incident occurs, create task: ││
│ │ ││
│ │ Title: [INCIDENT] Brief description ││
│ │ ││
│ │ Labels: ││
│ │ type/incident ││
│ │ severity/p1 (or p2, p3) ││
│ │ on-call/january-week-2 ││
│ │ ││
│ │ Details: ││
│ │ - Time detected: 2:34 AM ││
│ │ - Time acknowledged: 2:38 AM ││
│ │ - Time resolved: 3:15 AM ││
│ │ - On-call engineer: Mike ││
│ │ - Total on-call time: 41 min ││
│ │ ││
│ │ Link to post-incident review: [...] ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ TIME TRACKING FOR ON-CALL: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Use time tracking to log on-call work: ││
│ │ ││
│ │ Categories: ││
│ │ • on-call/incident-response ││
│ │ • on-call/monitoring-check ││
│ │ • on-call/escalation-support ││
│ │ ││
│ │ Monthly summary: ││
│ │ Mike: 4.5 hours on-call work (3 incidents) ││
│ │ Sarah: 2 hours on-call work (1 incident) ││
│ │ Alex: 6 hours on-call work (4 incidents) ││
│ └─────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────┘
Escalation Paths
Clear Escalation Structure
ESCALATION DESIGN:
┌─────────────────────────────────────────────────────────────┐
│ KNOWING WHO TO CALL │
├─────────────────────────────────────────────────────────────┤
│ │
│ TIERED ESCALATION: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ TIER 1: Primary on-call (0-15 min) ││
│ │ • First responder ││
│ │ • Triages and attempts resolution ││
│ │ • Escalates if can't resolve in 15 min ││
│ │ ││
│ │ TIER 2: Backup on-call (15-30 min) ││
│ │ • Joins if primary can't resolve ││
│ │ • Provides additional context/expertise ││
│ │ • Escalates if can't resolve in 30 min ││
│ │ ││
│ │ TIER 3: Engineering lead (30+ min) ││
│ │ • Major incidents only ││
│ │ • Coordinates multi-team response ││
│ │ • Approves major rollbacks/changes ││
│ │ ││
│ │ TIER 4: Executive (P0 incidents) ││
│ │ • Customer-impacting outages ││
│ │ • Handles external communication ││
│ │ • Authorizes extraordinary measures ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ ESCALATION TRIGGERS: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Automatic escalation if: ││
│ │ • No acknowledgment in 5 min → alert backup ││
│ │ • No resolution in 15 min → alert backup ││
│ │ • No resolution in 30 min → alert lead ││
│ │ • Customer impact confirmed → alert executive ││
│ │ ││
│ │ Document in runbook (NoteVault): ││
│ │ • Who to escalate to (names + contact info) ││
│ │ • When to escalate (clear triggers) ││
│ │ • How to escalate (phone, Slack, PagerDuty) ││
│ └─────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────┘
Compensation and Recovery
Recognizing On-Call Burden
COMPENSATION MODELS:
┌─────────────────────────────────────────────────────────────┐
│ FAIR TREATMENT FOR ON-CALL │
├─────────────────────────────────────────────────────────────┤
│ │
│ TIME-BASED COMPENSATION: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Option A: Time off in lieu ││
│ │ • 4 hours off for each overnight incident ││
│ │ • 8 hours off for weekend incidents ││
│ │ • Tracked in time tracking system ││
│ │ ││
│ │ Option B: On-call stipend ││
│ │ • Fixed amount per on-call week ││
│ │ • Additional per-incident bonus ││
│ │ • Common: $200-500/week + $50-100/incident ││
│ │ ││
│ │ Option C: Reduced sprint load ││
│ │ • On-call week = 70% sprint capacity ││
│ │ • Buffer for incident response ││
│ │ • Prevents sprint disruption ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ RECOVERY TIME: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ After significant incident: ││
│ │ ││
│ │ Night incident (2+ hours): ││
│ │ → Start late next day or take half-day off ││
│ │ ││
│ │ Weekend incident (4+ hours): ││
│ │ → Comp day within 2 weeks ││
│ │ ││
│ │ Document in team agreements (NoteVault): ││
│ │ "After any incident requiring 2+ hours outside ││
│ │ business hours, the responder takes equivalent ││
│ │ time off within the same week." ││
│ └─────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────┘
Sprint Integration
Balancing On-Call with Sprint Work
SPRINT PLANNING:
┌─────────────────────────────────────────────────────────────┐
│ ON-CALL AND DELIVERY WORK │
├─────────────────────────────────────────────────────────────┤
│ │
│ CAPACITY ADJUSTMENT: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Standard capacity: 40 hours/week ││
│ │ ││
│ │ On-call week: ││
│ │ • Primary: 28 hours sprint work (30% reduction) ││
│ │ • Backup: 36 hours sprint work (10% reduction) ││
│ │ ││
│ │ Why reduce capacity: ││
│ │ • Context switching cost ││
│ │ • Potential for interrupted work ││
│ │ • Mental load of being "available" ││
│ │ • Recovery from any incidents ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ TASK SELECTION FOR ON-CALL WEEKS: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Good tasks for on-call week: ││
│ │ ✅ Small, interruptible tasks ││
│ │ ✅ Code reviews ││
│ │ ✅ Documentation ││
│ │ ✅ Runbook updates ││
│ │ ✅ Technical debt items ││
│ │ ││
│ │ Avoid during on-call: ││
│ │ ❌ Deep focus features ││
│ │ ❌ Complex debugging ││
│ │ ❌ Time-sensitive deliverables ││
│ │ ❌ Customer meetings ││
│ └─────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────┘