Try free
9 min read Guide 136 of 877

Setting Up On-Call Rotations for Development Teams

On-call responsibilities often fall unevenly on senior developers, leading to burnout and resentment. A well-structured rotation distributes the burden fairly, provides clear escalation paths, and compensates on-call time appropriately—so teams can respond to incidents without sacrificing personal time or sprint commitments.

The On-Call Problem

Why Rotations Fail

ON-CALL DYSFUNCTION:
┌─────────────────────────────────────────────────────────────┐
│ COMMON ON-CALL FAILURES                                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│ THE "HERO" PATTERN:                                         │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ "Sarah knows the system best, she should handle it"     ││
│ │                                                         ││
│ │ Result after 6 months:                                  ││
│ │ • Sarah is exhausted                                    ││
│ │ • Nobody else learned the system                        ││
│ │ • Sarah leaves the company                              ││
│ │ • Team panics                                           ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ THE "EVERYONE ALL THE TIME" PATTERN:                        │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ "We all share on-call, so we're all always on-call"     ││
│ │                                                         ││
│ │ Result:                                                 ││
│ │ • Nobody feels truly off                                ││
│ │ • Alerts get ignored (someone else will get it)         ││
│ │ • Confusion during incidents                            ││
│ │ • No accountability                                     ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ THE "INVISIBLE BURDEN" PATTERN:                             │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ On-call happens but isn't tracked or compensated        ││
│ │                                                         ││
│ │ Result:                                                 ││
│ │ • Developers resent being "always available"            ││
│ │ • After-hours work isn't recognized                     ││
│ │ • Sprint commitments suffer                             ││
│ │ • Work-life balance erodes                              ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
└─────────────────────────────────────────────────────────────┘

Rotation Structure

Designing Fair Rotations

ROTATION DESIGN:
┌─────────────────────────────────────────────────────────────┐
│ CREATING EQUITABLE ON-CALL                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│ ROTATION PATTERNS:                                          │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Option A: Weekly rotation                               ││
│ │                                                         ││
│ │ Week 1: Developer A (primary), Developer B (backup)     ││
│ │ Week 2: Developer B (primary), Developer C (backup)     ││
│ │ Week 3: Developer C (primary), Developer D (backup)     ││
│ │ Week 4: Developer D (primary), Developer A (backup)     ││
│ │                                                         ││
│ │ Pros: Predictable, full context for week                ││
│ │ Cons: Long stretches of being "on"                      ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Option B: Weekday/Weekend split                         ││
│ │                                                         ││
│ │ Week 1 Weekdays: Developer A                            ││
│ │ Week 1 Weekend: Developer B                             ││
│ │ Week 2 Weekdays: Developer B                            ││
│ │ Week 2 Weekend: Developer C                             ││
│ │                                                         ││
│ │ Pros: Shorter on-call periods                           ││
│ │ Cons: More handoffs, context loss                       ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Option C: Follow-the-sun (distributed teams)            ││
│ │                                                         ││
│ │ US business hours: US team                              ││
│ │ EU business hours: EU team                              ││
│ │ APAC business hours: APAC team                          ││
│ │                                                         ││
│ │ Pros: No after-hours for anyone                         ││
│ │ Cons: Requires global team, complex handoffs            ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ MINIMUM TEAM SIZE FOR SUSTAINABLE ON-CALL:                  │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ 4+ developers: Weekly rotation with backup              ││
│ │ 6+ developers: Weekday/weekend split feasible           ││
│ │ 8+ developers: Multiple on-call tiers possible          ││
│ │                                                         ││
│ │ Below 4: Consider shared on-call with another team      ││
│ │          or managed services for critical monitoring    ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
└─────────────────────────────────────────────────────────────┘

GitScrum Implementation

Tracking On-Call in Your Workflow

ON-CALL TRACKING:
┌─────────────────────────────────────────────────────────────┐
│ MAKING ON-CALL VISIBLE                                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│ ROTATION SCHEDULE (NoteVault):                              │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ # Q1 2025 On-Call Schedule                              ││
│ │                                                         ││
│ │ | Week     | Primary | Backup  | Notes              |   ││
│ │ |----------|---------|---------|--------------------│   ││
│ │ | Jan 6-12 | Sarah   | Mike    |                    │   ││
│ │ | Jan 13-19| Mike    | Alex    |                    │   ││
│ │ | Jan 20-26| Alex    | Emma    |                    │   ││
│ │ | Jan 27-2 | Emma    | Sarah   | Emma PTO Feb 1     │   ││
│ │ | Feb 3-9  | Sarah   | Mike    |                    │   ││
│ │                                                         ││
│ │ ## Swap requests                                        ││
│ │ - [x] Emma ↔ Sarah for Jan 27 (approved)                ││
│ │                                                         ││
│ │ ## Coverage gaps                                        ││
│ │ - None currently                                        ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ INCIDENT TASK TRACKING:                                     │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ When incident occurs, create task:                      ││
│ │                                                         ││
│ │ Title: [INCIDENT] Brief description                     ││
│ │                                                         ││
│ │ Labels:                                                 ││
│ │ type/incident                                           ││
│ │ severity/p1 (or p2, p3)                                 ││
│ │ on-call/january-week-2                                  ││
│ │                                                         ││
│ │ Details:                                                ││
│ │ - Time detected: 2:34 AM                                ││
│ │ - Time acknowledged: 2:38 AM                            ││
│ │ - Time resolved: 3:15 AM                                ││
│ │ - On-call engineer: Mike                                ││
│ │ - Total on-call time: 41 min                            ││
│ │                                                         ││
│ │ Link to post-incident review: [...]                     ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ TIME TRACKING FOR ON-CALL:                                  │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Use time tracking to log on-call work:                  ││
│ │                                                         ││
│ │ Categories:                                             ││
│ │ • on-call/incident-response                             ││
│ │ • on-call/monitoring-check                              ││
│ │ • on-call/escalation-support                            ││
│ │                                                         ││
│ │ Monthly summary:                                        ││
│ │ Mike: 4.5 hours on-call work (3 incidents)              ││
│ │ Sarah: 2 hours on-call work (1 incident)                ││
│ │ Alex: 6 hours on-call work (4 incidents)                ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
└─────────────────────────────────────────────────────────────┘

Escalation Paths

Clear Escalation Structure

ESCALATION DESIGN:
┌─────────────────────────────────────────────────────────────┐
│ KNOWING WHO TO CALL                                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│ TIERED ESCALATION:                                          │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ TIER 1: Primary on-call (0-15 min)                      ││
│ │ • First responder                                       ││
│ │ • Triages and attempts resolution                       ││
│ │ • Escalates if can't resolve in 15 min                  ││
│ │                                                         ││
│ │ TIER 2: Backup on-call (15-30 min)                      ││
│ │ • Joins if primary can't resolve                        ││
│ │ • Provides additional context/expertise                 ││
│ │ • Escalates if can't resolve in 30 min                  ││
│ │                                                         ││
│ │ TIER 3: Engineering lead (30+ min)                      ││
│ │ • Major incidents only                                  ││
│ │ • Coordinates multi-team response                       ││
│ │ • Approves major rollbacks/changes                      ││
│ │                                                         ││
│ │ TIER 4: Executive (P0 incidents)                        ││
│ │ • Customer-impacting outages                            ││
│ │ • Handles external communication                        ││
│ │ • Authorizes extraordinary measures                     ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ ESCALATION TRIGGERS:                                        │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Automatic escalation if:                                ││
│ │ • No acknowledgment in 5 min → alert backup             ││
│ │ • No resolution in 15 min → alert backup                ││
│ │ • No resolution in 30 min → alert lead                  ││
│ │ • Customer impact confirmed → alert executive           ││
│ │                                                         ││
│ │ Document in runbook (NoteVault):                        ││
│ │ • Who to escalate to (names + contact info)             ││
│ │ • When to escalate (clear triggers)                     ││
│ │ • How to escalate (phone, Slack, PagerDuty)             ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
└─────────────────────────────────────────────────────────────┘

Compensation and Recovery

Recognizing On-Call Burden

COMPENSATION MODELS:
┌─────────────────────────────────────────────────────────────┐
│ FAIR TREATMENT FOR ON-CALL                                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│ TIME-BASED COMPENSATION:                                    │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Option A: Time off in lieu                              ││
│ │ • 4 hours off for each overnight incident               ││
│ │ • 8 hours off for weekend incidents                     ││
│ │ • Tracked in time tracking system                       ││
│ │                                                         ││
│ │ Option B: On-call stipend                               ││
│ │ • Fixed amount per on-call week                         ││
│ │ • Additional per-incident bonus                         ││
│ │ • Common: $200-500/week + $50-100/incident              ││
│ │                                                         ││
│ │ Option C: Reduced sprint load                           ││
│ │ • On-call week = 70% sprint capacity                    ││
│ │ • Buffer for incident response                          ││
│ │ • Prevents sprint disruption                            ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ RECOVERY TIME:                                              │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ After significant incident:                             ││
│ │                                                         ││
│ │ Night incident (2+ hours):                              ││
│ │ → Start late next day or take half-day off              ││
│ │                                                         ││
│ │ Weekend incident (4+ hours):                            ││
│ │ → Comp day within 2 weeks                               ││
│ │                                                         ││
│ │ Document in team agreements (NoteVault):                ││
│ │ "After any incident requiring 2+ hours outside          ││
│ │  business hours, the responder takes equivalent         ││
│ │  time off within the same week."                        ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
└─────────────────────────────────────────────────────────────┘

Sprint Integration

Balancing On-Call with Sprint Work

SPRINT PLANNING:
┌─────────────────────────────────────────────────────────────┐
│ ON-CALL AND DELIVERY WORK                                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│ CAPACITY ADJUSTMENT:                                        │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Standard capacity: 40 hours/week                        ││
│ │                                                         ││
│ │ On-call week:                                           ││
│ │ • Primary: 28 hours sprint work (30% reduction)         ││
│ │ • Backup: 36 hours sprint work (10% reduction)          ││
│ │                                                         ││
│ │ Why reduce capacity:                                    ││
│ │ • Context switching cost                                ││
│ │ • Potential for interrupted work                        ││
│ │ • Mental load of being "available"                      ││
│ │ • Recovery from any incidents                           ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
│ TASK SELECTION FOR ON-CALL WEEKS:                           │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Good tasks for on-call week:                            ││
│ │ ✅ Small, interruptible tasks                           ││
│ │ ✅ Code reviews                                         ││
│ │ ✅ Documentation                                        ││
│ │ ✅ Runbook updates                                      ││
│ │ ✅ Technical debt items                                 ││
│ │                                                         ││
│ │ Avoid during on-call:                                   ││
│ │ ❌ Deep focus features                                  ││
│ │ ❌ Complex debugging                                    ││
│ │ ❌ Time-sensitive deliverables                          ││
│ │ ❌ Customer meetings                                    ││
│ └─────────────────────────────────────────────────────────┘│
│                                                             │
└─────────────────────────────────────────────────────────────┘