8 min read • Guide 227 of 877
Measuring Team Performance Effectively
Measuring team performance is essential but dangerous. The wrong metrics create perverse incentives, gaming, and dysfunction. The right metrics illuminate reality, drive improvement, and respect complexity. Focus on outcomes over outputs, trends over snapshots, and team over individual metrics.
Metrics Philosophy
| Good Metrics | Bad Metrics |
|---|---|
| Team-level | Individual |
| Outcomes | Outputs |
| Trends over time | Point-in-time |
| Lead to improvement | Lead to gaming |
| Multiple together | Single in isolation |
Dangerous Metrics
What NOT to Measure
METRICS THAT CAUSE HARM
═══════════════════════
LINES OF CODE:
─────────────────────────────────────
Problem: More code ≠ better
Gaming: Verbose code, no refactoring
Result: Bloated, unmaintainable codebase
COMMITS PER DAY:
─────────────────────────────────────
Problem: Quantity ≠ quality
Gaming: Tiny commits, splitting work
Result: Noisy history, no real improvement
HOURS WORKED:
─────────────────────────────────────
Problem: Presence ≠ productivity
Gaming: Stay late doing nothing
Result: Burnout, resentment
BUGS FOUND (for testers):
─────────────────────────────────────
Problem: Incentivizes finding bugs over preventing
Gaming: Report minor issues, split bugs
Result: Focus on bugs, not quality
INDIVIDUAL VELOCITY:
─────────────────────────────────────
Problem: Discourages helping others
Gaming: Inflate estimates
Result: Team dysfunction
COMPARING TEAM VELOCITIES:
─────────────────────────────────────
Problem: Teams estimate differently
Gaming: Point inflation arms race
Result: Velocity loses meaning
Goodhart's Law in Action
GOODHART'S LAW EXAMPLES
═══════════════════════
"When a measure becomes a target,
it ceases to be a good measure."
EXAMPLE 1: Code Coverage Target
─────────────────────────────────────
Target: 80% code coverage
Gaming:
├── Tests that don't assert anything
├── Testing trivial getters/setters
├── Ignoring complex untestable code
└── Coverage high, confidence low
Better:
├── Track coverage but don't target
├── Focus on test quality
├── Review tests in code review
└── Mutation testing
EXAMPLE 2: Ticket Closure Target
─────────────────────────────────────
Target: Close 20 tickets/sprint
Gaming:
├── Split tickets artificially
├── Close without proper fix
├── Avoid hard tickets
└── Quantity up, value down
Better:
├── Track cycle time (less gameable)
├── Focus on value delivered
├── Review completion quality
└── Celebrate impact, not count
EXAMPLE 3: Response Time Target
─────────────────────────────────────
Target: Respond to bugs in < 4 hours
Gaming:
├── Quick "acknowledged" responses
├── No actual progress
├── Gaming the clock
└── Fast response, slow resolution
Better:
├── Track time to resolution
├── Customer satisfaction
├── Root cause addressing
└── End-to-end quality
Healthy Metrics
Team Delivery Metrics
DELIVERY METRICS THAT WORK
══════════════════════════
CYCLE TIME:
─────────────────────────────────────
Definition: Time from work started to done
Why good: Hard to game, measures flow
Track: Trend over time
Target: Decrease (faster delivery)
┌─────────────────────────────────────────────────────────┐
│ Cycle Time Trend │
│ ───────────────────────────────── │
│ 12 │ │
│ 9 │ ● │
│ 6 │ ● ● ● │
│ 3 │ ● ● ● ● │
│ 0 └─────────────────────────── │
│ S1 S2 S3 S4 S5 S6 S7 S8 │
│ │
│ Trend: Improving (9 days → 4 days) │
└─────────────────────────────────────────────────────────┘
THROUGHPUT:
─────────────────────────────────────
Definition: Items completed per sprint
Why good: Measures actual completion
Track: Average over time
Use: Planning, not performance
PREDICTABILITY:
─────────────────────────────────────
Definition: % of committed work completed
Why good: Measures realistic planning
Track: Should be 80-100%
Warning: If always 100%, commitments too low
LEAD TIME:
─────────────────────────────────────
Definition: Time from request to delivery
Why good: Customer-centric view
Track: By work type
Target: Reduce for high priority
Quality Metrics
QUALITY METRICS THAT WORK
═════════════════════════
ESCAPED DEFECTS:
─────────────────────────────────────
Definition: Bugs found in production
Why good: Measures what matters (quality to users)
Track: Count and severity
Target: Zero critical bugs
CHANGE FAILURE RATE:
─────────────────────────────────────
Definition: % of deploys causing incidents
Why good: Measures deployment quality
DORA metric: Elite teams < 15%
Track: Trend over time
TIME TO RECOVERY:
─────────────────────────────────────
Definition: Time to fix production issues
Why good: Measures response capability
DORA metric: Elite teams < 1 hour
Track: By severity
TECH DEBT RATIO:
─────────────────────────────────────
Definition: Debt items vs. total backlog
Why good: Tracks accumulation
Track: Should stay stable or decrease
Warning: If growing, address it
Outcome Metrics
OUTCOME METRICS THAT MATTER
═══════════════════════════
USER SATISFACTION:
─────────────────────────────────────
Definition: NPS, CSAT, or feedback scores
Why good: Measures actual value delivery
Track: After releases
Use: Quality of what we build
ADOPTION:
─────────────────────────────────────
Definition: Feature usage rates
Why good: Validates we built right thing
Track: After feature release
Use: Product decisions
BUSINESS IMPACT:
─────────────────────────────────────
Definition: Revenue, conversion, retention
Why good: Connects work to business value
Track: By initiative
Use: ROI of engineering work
DEVELOPER EXPERIENCE:
─────────────────────────────────────
Definition: Survey, deployment ease
Why good: Healthy team = sustainable
Track: Quarterly survey
Warning: Declining = future problems
Dashboard Design
Balanced Metrics View
TEAM PERFORMANCE DASHBOARD
══════════════════════════
┌─────────────────────────────────────────────────────────┐
│ Team Alpha - Performance Overview │
├─────────────────────────────────────────────────────────┤
│ │
│ DELIVERY QUALITY │
│ ───────────── ─────── │
│ Cycle Time: 4.2 days ↓ Escaped Bugs: 2 ↓ │
│ Throughput: 28 items/sprint Change Failure: 8% ↓ │
│ Predictability: 87% ✓ Recovery Time: 45m ✓ │
│ │
│ OUTCOMES HEALTH │
│ ──────── ────── │
│ Feature Adoption: 67% ↑ Team Happiness: 4.1/5 ✓ │
│ User Satisfaction: 4.2/5 Tech Debt: 15% (stable) │
│ Revenue Impact: +12% ↑ Burnout Risk: Low ✓ │
│ │
│ TRENDS (6 month) │
│ ──────────────────────────────────────────── │
│ All metrics improving or stable │
│ No concerning trends │
│ │
└─────────────────────────────────────────────────────────┘
PRINCIPLES:
├── Multiple metrics together
├── Trends, not snapshots
├── Team-level only
├── Balanced (delivery + quality + outcomes)
└── Health included
Using Metrics Well
For Improvement, Not Judgment
HEALTHY METRICS USAGE
═════════════════════
IN RETROSPECTIVES:
─────────────────────────────────────
"Cycle time increased from 4 to 6 days.
What changed? How might we improve?"
NOT:
"Cycle time increased. You need to work faster."
FOR EXPERIMENTS:
─────────────────────────────────────
"Let's try pair programming for 2 sprints
and see if quality improves."
Measure before/after
Learn from results
Adjust approach
FOR PLANNING:
─────────────────────────────────────
"Our throughput averages 25 items/sprint.
Let's commit to 24 with some buffer."
NOT:
"You did 25 last time, do 30 this time."
FOR CELEBRATION:
─────────────────────────────────────
"Cycle time dropped 40% over 6 months!
That's the result of better processes."
Celebrate improvement, not just hitting targets
Avoiding Dysfunction
PREVENTING METRIC GAMING
════════════════════════
MULTIPLE METRICS:
─────────────────────────────────────
Don't: Single metric target
Do: Balanced scorecard
If you only target velocity:
├── Quality may drop
├── Tech debt may grow
├── Burnout may increase
└── Balance with quality + health metrics
NO INDIVIDUAL RANKING:
─────────────────────────────────────
Don't: "Top 5 developers by commits"
Do: Team-level metrics only
Individual ranking:
├── Discourages helping others
├── Encourages gaming
├── Destroys collaboration
└── Harms culture
QUESTION THE METRIC:
─────────────────────────────────────
Regularly ask:
├── Is this metric still useful?
├── Are we gaming it?
├── Does it drive right behavior?
├── Should we change it?
└── Metrics evolve with team
GitScrum Analytics
Performance Tracking
GITSCRUM PERFORMANCE FEATURES
═════════════════════════════
BUILT-IN METRICS:
─────────────────────────────────────
├── Cycle time by work type
├── Throughput per sprint
├── Sprint predictability
├── Burndown/burnup charts
├── Velocity trend
└── Blocked time
CUSTOM DASHBOARDS:
─────────────────────────────────────
├── Select metrics to display
├── Set time ranges
├── Filter by project/team
├── Share with stakeholders
└── Export for analysis
QUALITY INTEGRATION:
─────────────────────────────────────
├── Link bugs to features
├── Track escaped defects
├── Monitor resolution time
└── Quality trends
REPORTS:
─────────────────────────────────────
├── Weekly performance summary
├── Sprint completion report
├── Trend analysis
├── Team comparison (careful!)
└── Custom report builder
Best Practices
For Team Metrics
- Measure outcomes — Not just outputs
- Use trends — Not snapshots
- Balance multiple metrics — Never single
- Team-level only — No individual ranking
- Drive improvement — Not judgment
Anti-Patterns
METRICS MISTAKES:
✗ Individual performance ranking
✗ Single metric focus
✗ Metrics as targets
✗ Comparing team velocities
✗ Lines of code / commits
✗ Hours worked
✗ Using metrics punitively
✗ Set and forget