Team Performance Metrics | Outcomes Over Outputs
Measure team performance with outcome-based metrics that drive improvement without gaming. GitScrum tracks cycle time, quality, and delivery trends.
8 min read
Measuring team performance is essential but dangerous. The wrong metrics create perverse incentives, gaming, and dysfunction. The right metrics illuminate reality, drive improvement, and respect complexity. Focus on outcomes over outputs, trends over snapshots, and team over individual metrics.
Metrics Philosophy
| Good Metrics | Bad Metrics |
|---|---|
| Team-level | Individual |
| Outcomes | Outputs |
| Trends over time | Point-in-time |
| Lead to improvement | Lead to gaming |
| Multiple together | Single in isolation |
Dangerous Metrics
What NOT to Measure
METRICS THAT CAUSE HARM
βββββββββββββββββββββββ
LINES OF CODE:
βββββββββββββββββββββββββββββββββββββ
Problem: More code β better
Gaming: Verbose code, no refactoring
Result: Bloated, unmaintainable codebase
COMMITS PER DAY:
βββββββββββββββββββββββββββββββββββββ
Problem: Quantity β quality
Gaming: Tiny commits, splitting work
Result: Noisy history, no real improvement
HOURS WORKED:
βββββββββββββββββββββββββββββββββββββ
Problem: Presence β productivity
Gaming: Stay late doing nothing
Result: Burnout, resentment
BUGS FOUND (for testers):
βββββββββββββββββββββββββββββββββββββ
Problem: Incentivizes finding bugs over preventing
Gaming: Report minor issues, split bugs
Result: Focus on bugs, not quality
INDIVIDUAL VELOCITY:
βββββββββββββββββββββββββββββββββββββ
Problem: Discourages helping others
Gaming: Inflate estimates
Result: Team dysfunction
COMPARING TEAM VELOCITIES:
βββββββββββββββββββββββββββββββββββββ
Problem: Teams estimate differently
Gaming: Point inflation arms race
Result: Velocity loses meaning
Goodhart's Law in Action
GOODHART'S LAW EXAMPLES
βββββββββββββββββββββββ
"When a measure becomes a target,
it ceases to be a good measure."
EXAMPLE 1: Code Coverage Target
βββββββββββββββββββββββββββββββββββββ
Target: 80% code coverage
Gaming:
βββ Tests that don't assert anything
βββ Testing trivial getters/setters
βββ Ignoring complex untestable code
βββ Coverage high, confidence low
Better:
βββ Track coverage but don't target
βββ Focus on test quality
βββ Review tests in code review
βββ Mutation testing
EXAMPLE 2: Ticket Closure Target
βββββββββββββββββββββββββββββββββββββ
Target: Close 20 tickets/sprint
Gaming:
βββ Split tickets artificially
βββ Close without proper fix
βββ Avoid hard tickets
βββ Quantity up, value down
Better:
βββ Track cycle time (less gameable)
βββ Focus on value delivered
βββ Review completion quality
βββ Celebrate impact, not count
EXAMPLE 3: Response Time Target
βββββββββββββββββββββββββββββββββββββ
Target: Respond to bugs in < 4 hours
Gaming:
βββ Quick "acknowledged" responses
βββ No actual progress
βββ Gaming the clock
βββ Fast response, slow resolution
Better:
βββ Track time to resolution
βββ Customer satisfaction
βββ Root cause addressing
βββ End-to-end quality
Healthy Metrics
Team Delivery Metrics
DELIVERY METRICS THAT WORK
ββββββββββββββββββββββββββ
CYCLE TIME:
βββββββββββββββββββββββββββββββββββββ
Definition: Time from work started to done
Why good: Hard to game, measures flow
Track: Trend over time
Target: Decrease (faster delivery)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cycle Time Trend β
β βββββββββββββββββββββββββββββββββ β
β 12 β β
β 9 β β β
β 6 β β β β β
β 3 β β β β β β
β 0 ββββββββββββββββββββββββββββ β
β S1 S2 S3 S4 S5 S6 S7 S8 β
β β
β Trend: Improving (9 days β 4 days) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
THROUGHPUT:
βββββββββββββββββββββββββββββββββββββ
Definition: Items completed per sprint
Why good: Measures actual completion
Track: Average over time
Use: Planning, not performance
PREDICTABILITY:
βββββββββββββββββββββββββββββββββββββ
Definition: % of committed work completed
Why good: Measures realistic planning
Track: Should be 80-100%
Warning: If always 100%, commitments too low
LEAD TIME:
βββββββββββββββββββββββββββββββββββββ
Definition: Time from request to delivery
Why good: Customer-centric view
Track: By work type
Target: Reduce for high priority
Quality Metrics
QUALITY METRICS THAT WORK
βββββββββββββββββββββββββ
ESCAPED DEFECTS:
βββββββββββββββββββββββββββββββββββββ
Definition: Bugs found in production
Why good: Measures what matters (quality to users)
Track: Count and severity
Target: Zero critical bugs
CHANGE FAILURE RATE:
βββββββββββββββββββββββββββββββββββββ
Definition: % of deploys causing incidents
Why good: Measures deployment quality
DORA metric: Elite teams < 15%
Track: Trend over time
TIME TO RECOVERY:
βββββββββββββββββββββββββββββββββββββ
Definition: Time to fix production issues
Why good: Measures response capability
DORA metric: Elite teams < 1 hour
Track: By severity
TECH DEBT RATIO:
βββββββββββββββββββββββββββββββββββββ
Definition: Debt items vs. total backlog
Why good: Tracks accumulation
Track: Should stay stable or decrease
Warning: If growing, address it
Outcome Metrics
OUTCOME METRICS THAT MATTER
βββββββββββββββββββββββββββ
USER SATISFACTION:
βββββββββββββββββββββββββββββββββββββ
Definition: NPS, CSAT, or feedback scores
Why good: Measures actual value delivery
Track: After releases
Use: Quality of what we build
ADOPTION:
βββββββββββββββββββββββββββββββββββββ
Definition: Feature usage rates
Why good: Validates we built right thing
Track: After feature release
Use: Product decisions
BUSINESS IMPACT:
βββββββββββββββββββββββββββββββββββββ
Definition: Revenue, conversion, retention
Why good: Connects work to business value
Track: By initiative
Use: ROI of engineering work
DEVELOPER EXPERIENCE:
βββββββββββββββββββββββββββββββββββββ
Definition: Survey, deployment ease
Why good: Healthy team = sustainable
Track: Quarterly survey
Warning: Declining = future problems
Dashboard Design
Balanced Metrics View
TEAM PERFORMANCE DASHBOARD
ββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Team Alpha - Performance Overview β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β DELIVERY QUALITY β
β βββββββββββββ βββββββ β
β Cycle Time: 4.2 days β Escaped Bugs: 2 β β
β Throughput: 28 items/sprint Change Failure: 8% β β
β Predictability: 87% β Recovery Time: 45m β β
β β
β OUTCOMES HEALTH β
β ββββββββ ββββββ β
β Feature Adoption: 67% β Team Happiness: 4.1/5 β β
β User Satisfaction: 4.2/5 Tech Debt: 15% (stable) β
β Revenue Impact: +12% β Burnout Risk: Low β β
β β
β TRENDS (6 month) β
β ββββββββββββββββββββββββββββββββββββββββββββ β
β All metrics improving or stable β
β No concerning trends β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PRINCIPLES:
βββ Multiple metrics together
βββ Trends, not snapshots
βββ Team-level only
βββ Balanced (delivery + quality + outcomes)
βββ Health included
Using Metrics Well
For Improvement, Not Judgment
HEALTHY METRICS USAGE
βββββββββββββββββββββ
IN RETROSPECTIVES:
βββββββββββββββββββββββββββββββββββββ
"Cycle time increased from 4 to 6 days.
What changed? How might we improve?"
NOT:
"Cycle time increased. You need to work faster."
FOR EXPERIMENTS:
βββββββββββββββββββββββββββββββββββββ
"Let's try pair programming for 2 sprints
and see if quality improves."
Measure before/after
Learn from results
Adjust approach
FOR PLANNING:
βββββββββββββββββββββββββββββββββββββ
"Our throughput averages 25 items/sprint.
Let's commit to 24 with some buffer."
NOT:
"You did 25 last time, do 30 this time."
FOR CELEBRATION:
βββββββββββββββββββββββββββββββββββββ
"Cycle time dropped 40% over 6 months!
That's the result of better processes."
Celebrate improvement, not just hitting targets
Avoiding Dysfunction
PREVENTING METRIC GAMING
ββββββββββββββββββββββββ
MULTIPLE METRICS:
βββββββββββββββββββββββββββββββββββββ
Don't: Single metric target
Do: Balanced scorecard
If you only target velocity:
βββ Quality may drop
βββ Tech debt may grow
βββ Burnout may increase
βββ Balance with quality + health metrics
NO INDIVIDUAL RANKING:
βββββββββββββββββββββββββββββββββββββ
Don't: "Top 5 developers by commits"
Do: Team-level metrics only
Individual ranking:
βββ Discourages helping others
βββ Encourages gaming
βββ Destroys collaboration
βββ Harms culture
QUESTION THE METRIC:
βββββββββββββββββββββββββββββββββββββ
Regularly ask:
βββ Is this metric still useful?
βββ Are we gaming it?
βββ Does it drive right behavior?
βββ Should we change it?
βββ Metrics evolve with team
GitScrum Analytics
Performance Tracking
GITSCRUM PERFORMANCE FEATURES
βββββββββββββββββββββββββββββ
BUILT-IN METRICS:
βββββββββββββββββββββββββββββββββββββ
βββ Cycle time by work type
βββ Throughput per sprint
βββ Sprint predictability
βββ Burndown/burnup charts
βββ Velocity trend
βββ Blocked time
CUSTOM DASHBOARDS:
βββββββββββββββββββββββββββββββββββββ
βββ Select metrics to display
βββ Set time ranges
βββ Filter by project/team
βββ Share with stakeholders
βββ Export for analysis
QUALITY INTEGRATION:
βββββββββββββββββββββββββββββββββββββ
βββ Link bugs to features
βββ Track escaped defects
βββ Monitor resolution time
βββ Quality trends
REPORTS:
βββββββββββββββββββββββββββββββββββββ
βββ Weekly performance summary
βββ Sprint completion report
βββ Trend analysis
βββ Team comparison (careful!)
βββ Custom report builder
Best Practices
For Team Metrics
Anti-Patterns
METRICS MISTAKES:
β Individual performance ranking
β Single metric focus
β Metrics as targets
β Comparing team velocities
β Lines of code / commits
β Hours worked
β Using metrics punitively
β Set and forget