8 min read • Guide 767 of 877
Infrastructure as Code Project Management
Infrastructure as Code (IaC) requires the same rigor as application development. GitScrum helps teams manage infrastructure work with proper review processes and change tracking.
Infrastructure Task Structure
IaC Task Format
INFRASTRUCTURE TASK STRUCTURE:
┌─────────────────────────────────────────────────────────────┐
│ │
│ INFRASTRUCTURE CHANGE TASK: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ INFRA-100: Add Redis cluster for caching ││
│ │ ││
│ │ WHAT: ││
│ │ Provision ElastiCache Redis cluster (3 nodes) ││
│ │ ││
│ │ WHY: ││
│ │ Application caching layer for PROJ-456 ││
│ │ Supports: User session storage, API response cache ││
│ │ ││
│ │ RESOURCES: ││
│ │ • ElastiCache cluster (cache.r6g.large x 3) ││
│ │ • Security group ││
│ │ • Parameter group ││
│ │ • Subnet group ││
│ │ ││
│ │ COST IMPACT: ││
│ │ Estimated: ~$400/month ││
│ │ Approved: Yes (budget ticket BUDGET-123) ││
│ │ ││
│ │ BLAST RADIUS: ││
│ │ New resources only - no impact to existing ││
│ │ ││
│ │ ROLLBACK: ││
│ │ terraform destroy for new resources ││
│ │ ││
│ │ DEPLOYMENT WINDOW: ││
│ │ Any time (no downtime expected) ││
│ │ ││
│ │ CHECKLIST: ││
│ │ ☐ Terraform code ││
│ │ ☐ Code review ││
│ │ ☐ Apply in staging ││
│ │ ☐ Test connectivity ││
│ │ ☐ Apply in production ││
│ │ ☐ Update documentation ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
Change Categories
INFRASTRUCTURE CHANGE TYPES:
┌─────────────────────────────────────────────────────────────┐
│ │
│ LOW RISK (Self-approve OK): │
│ • New resources (no existing impact) │
│ • Tag changes │
│ • Increasing capacity │
│ • Adding monitoring │
│ │
│ MEDIUM RISK (Peer review required): │
│ • Security group changes │
│ • IAM policy updates │
│ • Configuration changes │
│ • Scaling policies │
│ │
│ HIGH RISK (Multiple reviewers + window): │
│ • Database changes │
│ • Network changes │
│ • Destructive operations │
│ • Production secrets │
│ │
│ ─────────────────────────────────────────────────────────── │
│ │
│ HIGH RISK TASK: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ 🔴 INFRA-150: Migrate to new VPC ││
│ │ ││
│ │ Risk: HIGH ││
│ │ Downtime: Expected (maintenance window) ││
│ │ ││
│ │ APPROVAL REQUIRED: ││
│ │ ☐ DevOps lead review ││
│ │ ☐ Security review (network change) ││
│ │ ☐ Change advisory board ││
│ │ ││
│ │ DEPLOYMENT: ││
│ │ ☐ Scheduled window: Saturday 2AM ││
│ │ ☐ Rollback plan documented ││
│ │ ☐ On-call team notified ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
Terraform Workflow
Terraform Task Flow
TERRAFORM CHANGE WORKFLOW:
┌─────────────────────────────────────────────────────────────┐
│ │
│ 1. CREATE TASK │
│ Document what/why/impact │
│ │
│ 2. DEVELOP │
│ Write Terraform code │
│ terraform plan locally │
│ │
│ 3. CODE REVIEW │
│ PR with terraform plan output │
│ Review changes carefully │
│ │
│ 4. STAGING APPLY │
│ terraform apply in staging │
│ Verify functionality │
│ │
│ 5. PRODUCTION APPLY │
│ Scheduled if high-risk │
│ terraform apply in production │
│ Verify and monitor │
│ │
│ ─────────────────────────────────────────────────────────── │
│ │
│ PR CHECKLIST FOR TERRAFORM: │
│ │
│ ☐ terraform fmt applied │
│ ☐ terraform validate passes │
│ ☐ Plan output included in PR │
│ ☐ No secrets in code │
│ ☐ Cost estimate included (for new resources) │
│ ☐ Documentation updated │
│ ☐ Rollback plan documented │
│ │
│ PLAN OUTPUT IN PR: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ Plan: 3 to add, 1 to change, 0 to destroy. ││
│ │ ││
│ │ + aws_elasticache_cluster.main ││
│ │ + aws_security_group.redis ││
│ │ + aws_elasticache_subnet_group.main ││
│ │ ~ aws_security_group.app (add egress rule) ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
Kubernetes Work
K8s Task Structure
KUBERNETES CHANGE TASK:
┌─────────────────────────────────────────────────────────────┐
│ │
│ K8S DEPLOYMENT CHANGE: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ INFRA-200: Update API deployment configuration ││
│ │ ││
│ │ CHANGES: ││
│ │ • Increase replicas: 3 → 5 ││
│ │ • Update resource limits ││
│ │ • Add readiness probe ││
│ │ ││
│ │ MANIFESTS AFFECTED: ││
│ │ • k8s/api/deployment.yaml ││
│ │ • k8s/api/hpa.yaml ││
│ │ ││
│ │ ROLLOUT STRATEGY: ││
│ │ Rolling update (no downtime) ││
│ │ ││
│ │ TESTING: ││
│ │ ☐ Apply to staging namespace ││
│ │ ☐ Verify pods healthy ││
│ │ ☐ Load test ││
│ │ ☐ Apply to production namespace ││
│ │ ││
│ │ ROLLBACK: ││
│ │ kubectl rollout undo deployment/api ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ HELM UPGRADE TASK: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ INFRA-210: Upgrade ingress-nginx to 4.8.0 ││
│ │ ││
│ │ Current: 4.6.0 ││
│ │ Target: 4.8.0 ││
│ │ ││
│ │ CHANGELOG REVIEW: ││
│ │ • Breaking changes: None ││
│ │ • New features: Rate limiting improvements ││
│ │ • Bug fixes: Memory leak fix ││
│ │ ││
│ │ RISK: Medium (ingress affects all traffic) ││
│ │ ││
│ │ TESTING: ││
│ │ ☐ Upgrade in staging ││
│ │ ☐ Test all ingress routes ││
│ │ ☐ Monitor for errors ││
│ │ ☐ Upgrade in production ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
Change Management
Infrastructure Change Process
CHANGE MANAGEMENT:
┌─────────────────────────────────────────────────────────────┐
│ │
│ CHANGE ADVISORY BOARD (for high-risk): │
│ │
│ WEEKLY CAB MEETING: │
│ Review upcoming high-risk infrastructure changes │
│ │
│ CAB REVIEW TASK: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ CAB-2024-05: VPC Migration Review ││
│ │ ││
│ │ Change: INFRA-150 (VPC migration) ││
│ │ Requestor: @devops-lead ││
│ │ ││
│ │ REVIEW ITEMS: ││
│ │ ☐ Business justification ││
│ │ ☐ Technical approach ││
│ │ ☐ Risk assessment ││
│ │ ☐ Rollback plan ││
│ │ ☐ Communication plan ││
│ │ ☐ Testing completed ││
│ │ ││
│ │ DECISION: Approved / Needs changes / Rejected ││
│ │ ││
│ │ CONDITIONS: ││
│ │ • Must complete during maintenance window ││
│ │ • Database team on standby ││
│ │ • Customer notification sent ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ MAINTENANCE WINDOWS: │
│ │
│ Standard window: Saturday 2-6 AM │
│ Emergency window: As needed with approval │
│ │
│ COMMUNICATION: │
│ ☐ Status page scheduled maintenance │
│ ☐ Customer email (if significant) │
│ ☐ Internal Slack notification │
└─────────────────────────────────────────────────────────────┘
Documentation
Infrastructure Documentation
INFRASTRUCTURE DOCUMENTATION:
┌─────────────────────────────────────────────────────────────┐
│ │
│ DOCUMENTATION TASK: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ INFRA-DOC-01: Document Redis cluster setup ││
│ │ ││
│ │ Created after: INFRA-100 ││
│ │ ││
│ │ DOCUMENTATION: ││
│ │ ☐ Architecture diagram ││
│ │ ☐ Configuration details ││
│ │ ☐ Access and credentials ││
│ │ ☐ Monitoring and alerts ││
│ │ ☐ Runbook for common issues ││
│ │ ☐ Disaster recovery ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ RUNBOOK TEMPLATE: │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ REDIS CLUSTER RUNBOOK ││
│ │ ││
│ │ COMMON ISSUES: ││
│ │ ││
│ │ High memory usage: ││
│ │ 1. Check which keys consuming memory ││
│ │ 2. Review TTL settings ││
│ │ 3. Scale up if legitimate growth ││
│ │ ││
│ │ Connection failures: ││
│ │ 1. Check security group rules ││
│ │ 2. Verify app subnet can reach Redis subnet ││
│ │ 3. Check Redis cluster status ││
│ │ ││
│ │ DISASTER RECOVERY: ││
│ │ 1. Snapshots: Daily automatic ││
│ │ 2. Restore: terraform apply with snapshot_id ││
│ │ 3. RTO: 30 minutes ││
│ └─────────────────────────────────────────────────────────┘│
│ │
│ DEFINITION OF DONE FOR INFRA: │
│ ☐ Code reviewed and merged │
│ ☐ Applied successfully │
│ ☐ Monitoring configured │
│ ☐ Documentation updated │
│ ☐ Runbook created/updated │
└─────────────────────────────────────────────────────────────┘