Project Estimation | Story Points & Planning Poker

Estimate projects with story points, Planning Poker, and T-shirt sizing. GitScrum tracks velocity and helps teams calibrate estimates over time.

8 min read

Accurate estimation is the foundation of realistic planning—underestimate and you miss deadlines, overestimate and you lose trust. GitScrum's effort points and historical velocity data help teams calibrate estimates against actual performance, improving accuracy over time. The key is treating estimation as a skill to develop, not a precise science.

Estimation Approaches

Technique	Best For	Accuracy	Effort
Story Points	Sprint planning	Medium-High	Medium
Planning Poker	Team consensus	Medium-High	Medium
T-Shirt Sizing	Roadmap planning	Low-Medium	Low
Historical Data	Forecasting	High	Low
Three-Point	Risk-aware estimates	Medium-High	High

Story Point Estimation

STORY POINT FUNDAMENTALS

WHAT POINTS REPRESENT:
┌─────────────────────────────────────────────────┐
│  Story points measure relative effort:          │
│                                                 │
│  Effort = Complexity + Uncertainty + Volume     │
│                                                 │
│  NOT measured in hours because:                 │
│  ├── Different people work at different speeds  │
│  ├── Same person varies day to day              │
│  └── Points are more stable over time           │
└─────────────────────────────────────────────────┘

POINT SCALE (Fibonacci):
┌─────────────────────────────────────────────────┐
│  1 point:  Trivial, well-understood             │
│            Example: Text change, config update  │
│                                                 │
│  2 points: Small, straightforward               │
│            Example: Add field to form           │
│                                                 │
│  3 points: Medium, some complexity              │
│            Example: New API endpoint            │
│                                                 │
│  5 points: Large, multiple components           │
│            Example: Feature with UI + backend   │
│                                                 │
│  8 points: Very large, significant unknowns     │
│            Example: New integration             │
│            Consider: Should this be split?      │
│                                                 │
│  13 points: Epic-sized, too large               │
│             Must be broken down                 │
└─────────────────────────────────────────────────┘

REFERENCE STORIES:
┌─────────────────────────────────────────────────┐
│  Establish team calibration stories:            │
│                                                 │
│  "Adding a new dashboard widget" = 3 points     │
│  "Simple bug fix" = 1 point                     │
│  "New feature end-to-end" = 5-8 points          │
│                                                 │
│  Compare new stories to references:             │
│  "Is this harder or easier than the widget?"    │
└─────────────────────────────────────────────────┘

Planning Poker

PLANNING POKER PROCESS

SETUP:
┌─────────────────────────────────────────────────┐
│  Participants: Development team (everyone who   │
│  will work on the items)                        │
│                                                 │
│  Cards: 0, 1, 2, 3, 5, 8, 13, ?, ☕             │
│  ├── 0: Already done or trivial                 │
│  ├── 1-13: Story point estimates                │
│  ├── ?: Need more information                   │
│  └── ☕: Need a break                           │
└─────────────────────────────────────────────────┘

PROCESS:
┌─────────────────────────────────────────────────┐
│  1. Product Owner reads user story              │
│                                                 │
│  2. Team asks clarifying questions              │
│     (Time-box: 2-3 minutes)                     │
│                                                 │
│  3. Everyone privately selects a card           │
│                                                 │
│  4. All reveal simultaneously                   │
│                                                 │
│  5. Discuss differences:                        │
│     ├── Highest: "Why do you think 8?"          │
│     ├── Lowest: "Why do you think 2?"           │
│     └── Discussion surfaces hidden complexity   │
│                                                 │
│  6. Re-vote if needed                           │
│     (Usually converges in 2 rounds)             │
│                                                 │
│  7. Record consensus estimate                   │
└─────────────────────────────────────────────────┘

DISCUSSION EXAMPLES:
┌─────────────────────────────────────────────────┐
│  Scenario: Votes are 2, 3, 3, 5, 8              │
│                                                 │
│  Facilitator: "@high, why 8?"                   │
│  @high: "We need to update the legacy auth      │
│  system, which is always problematic."          │
│                                                 │
│  Facilitator: "@low, why 2?"                    │
│  @low: "I thought we were just adding a         │
│  button. I didn't realize auth was involved."   │
│                                                 │
│  Result: Team now has same understanding        │
│  Re-vote: 5, 5, 5, 5, 8 → Consensus at 5        │
└─────────────────────────────────────────────────┘

T-Shirt Sizing

T-SHIRT SIZING FOR ROADMAP

T-SHIRT SCALE:
┌─────────────────────────────────────────────────┐
│  XS: < 1 week of work                           │
│  S:  1-2 weeks                                  │
│  M:  2-4 weeks                                  │
│  L:  1-2 months                                 │
│  XL: 2-3 months                                 │
│  XXL: 3+ months (needs breakdown)               │
└─────────────────────────────────────────────────┘

ROADMAP EXAMPLE:
┌─────────────────────────────────────────────────┐
│  Q2 2025 Initiatives:                           │
│                                                 │
│  ├── Mobile app redesign          [L]           │
│  ├── API v3 migration             [XL]          │
│  ├── Performance improvements     [M]           │
│  ├── User dashboard refresh       [M]           │
│  └── Security audit fixes         [S]           │
│                                                 │
│  Capacity: 1 L + 2 M + 2 S per quarter          │
│  Current load: 1 XL + 1 L + 2 M + 1 S           │
│  Decision: Defer API v3 or get more capacity    │
└─────────────────────────────────────────────────┘

CONVERTING T-SHIRT TO SPRINT POINTS:
┌─────────────────────────────────────────────────┐
│  After roadmap approval, break down:            │
│                                                 │
│  Mobile app redesign [L]:                       │
│  ├── UX research & design: 20 pts               │
│  ├── Core navigation: 25 pts                    │
│  ├── Feature parity: 35 pts                     │
│  └── Polish & testing: 15 pts                   │
│  Total: 95 pts ≈ 3 sprints at 30 pts/sprint     │
└─────────────────────────────────────────────────┘

Three-Point Estimation

THREE-POINT ESTIMATION

FORMULA:
┌─────────────────────────────────────────────────┐
│  Expected = (Optimistic + 4×MostLikely + Pess)  │
│             ────────────────────────────────    │
│                          6                      │
│                                                 │
│  Standard Deviation = (Pessimistic - Optimistic)│
│                       ──────────────────────    │
│                               6                 │
└─────────────────────────────────────────────────┘

EXAMPLE:
┌─────────────────────────────────────────────────┐
│  Feature: Payment integration                   │
│                                                 │
│  Optimistic:    2 weeks  (everything goes right)│
│  Most Likely:   4 weeks  (normal bumps)         │
│  Pessimistic:   8 weeks  (major issues)         │
│                                                 │
│  Expected = (2 + 4×4 + 8) / 6 = 4.3 weeks       │
│  Std Dev = (8 - 2) / 6 = 1 week                 │
│                                                 │
│  Estimate: 4-5 weeks (with confidence interval) │
│  Tell stakeholders: "Plan for 5 weeks"          │
└─────────────────────────────────────────────────┘

Improving Estimates

ESTIMATION CALIBRATION

TRACK ACTUAL VS ESTIMATED:
┌─────────────────────────────────────────────────┐
│  Sprint 22 Analysis:                            │
│                                                 │
│  Story          Estimated   Actual    Variance  │
│  ──────────────────────────────────────────     │
│  User profile       3         3        0%       │
│  Search filter      5         8       +60%      │
│  Dashboard          5         4       -20%      │
│  API endpoint       3         3        0%       │
│  Report export      5         7       +40%      │
│                                                 │
│  Average variance: +16%                         │
│  Largest miss: Search filter                    │
│                                                 │
│  Discuss: Why was search filter underestimated? │
│  Learning: Filtering with multiple data sources │
│  is more complex than expected.                 │
└─────────────────────────────────────────────────┘

ESTIMATION IMPROVEMENT LOOP:
┌─────────────────────────────────────────────────┐
│  1. Estimate before sprint                      │
│  2. Track actual time during sprint             │
│  3. Compare at sprint end                       │
│  4. Discuss large variances in retro            │
│  5. Update reference stories if needed          │
│  6. Apply learnings to next estimates           │
│                                                 │
│  Over time: Team estimates become more accurate │
└─────────────────────────────────────────────────┘

COMMON ESTIMATION MISTAKES:
┌─────────────────────────────────────────────────┐
│  ✗ Forgetting testing time                      │
│  ✗ Ignoring code review cycles                  │
│  ✗ Not accounting for meetings                  │
│  ✗ Assuming best-case scenario                  │
│  ✗ Comparing to senior dev's speed              │
│  ✗ Missing integration complexity               │
│  ✗ Underestimating unknowns                     │
│                                                 │
│  Fix: Include all work in estimate, add buffer  │
└─────────────────────────────────────────────────┘

Best Practices

Estimate as a team for shared understanding

Use relative sizing (points) for sprint work

Compare to reference stories for calibration

Track actual vs estimated to improve

Break down large items before estimating

Include all work — testing, review, deployment

Add buffers for unknowns — uncertainty exists

Re-estimate when learning more — it's okay

Anti-Patterns

✗ One person estimates for the team
✗ Converting points directly to hours
✗ Never reviewing estimate accuracy
✗ Estimating items that aren't understood
✗ Punishing missed estimates
✗ Sandbagging all estimates