Infrastructure as Code Management | Terraform, K8s Tracking
Manage Terraform and Kubernetes infrastructure alongside application development. GitScrum tracks IaC changes, reviews, deployment windows, and change approvals.
8 min read
Infrastructure as Code (IaC) requires the same rigor as application development. GitScrum helps teams manage infrastructure work with proper review processes and change tracking.
Infrastructure Task Structure
IaC Task Format
INFRASTRUCTURE TASK STRUCTURE:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β INFRASTRUCTURE CHANGE TASK: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β INFRA-100: Add Redis cluster for caching ββ
β β ββ
β β WHAT: ββ
β β Provision ElastiCache Redis cluster (3 nodes) ββ
β β ββ
β β WHY: ββ
β β Application caching layer for PROJ-456 ββ
β β Supports: User session storage, API response cache ββ
β β ββ
β β RESOURCES: ββ
β β β’ ElastiCache cluster (cache.r6g.large x 3) ββ
β β β’ Security group ββ
β β β’ Parameter group ββ
β β β’ Subnet group ββ
β β ββ
β β COST IMPACT: ββ
β β Estimated: ~$400/month ββ
β β Approved: Yes (budget ticket BUDGET-123) ββ
β β ββ
β β BLAST RADIUS: ββ
β β New resources only - no impact to existing ββ
β β ββ
β β ROLLBACK: ββ
β β terraform destroy for new resources ββ
β β ββ
β β DEPLOYMENT WINDOW: ββ
β β Any time (no downtime expected) ββ
β β ββ
β β CHECKLIST: ββ
β β β Terraform code ββ
β β β Code review ββ
β β β Apply in staging ββ
β β β Test connectivity ββ
β β β Apply in production ββ
β β β Update documentation ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Change Categories
INFRASTRUCTURE CHANGE TYPES:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β LOW RISK (Self-approve OK): β
β β’ New resources (no existing impact) β
β β’ Tag changes β
β β’ Increasing capacity β
β β’ Adding monitoring β
β β
β MEDIUM RISK (Peer review required): β
β β’ Security group changes β
β β’ IAM policy updates β
β β’ Configuration changes β
β β’ Scaling policies β
β β
β HIGH RISK (Multiple reviewers + window): β
β β’ Database changes β
β β’ Network changes β
β β’ Destructive operations β
β β’ Production secrets β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β HIGH RISK TASK: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β π΄ INFRA-150: Migrate to new VPC ββ
β β ββ
β β Risk: HIGH ββ
β β Downtime: Expected (maintenance window) ββ
β β ββ
β β APPROVAL REQUIRED: ββ
β β β DevOps lead review ββ
β β β Security review (network change) ββ
β β β Change advisory board ββ
β β ββ
β β DEPLOYMENT: ββ
β β β Scheduled window: Saturday 2AM ββ
β β β Rollback plan documented ββ
β β β On-call team notified ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Terraform Workflow
Terraform Task Flow
TERRAFORM CHANGE WORKFLOW:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β 1. CREATE TASK β
β Document what/why/impact β
β β
β 2. DEVELOP β
β Write Terraform code β
β terraform plan locally β
β β
β 3. CODE REVIEW β
β PR with terraform plan output β
β Review changes carefully β
β β
β 4. STAGING APPLY β
β terraform apply in staging β
β Verify functionality β
β β
β 5. PRODUCTION APPLY β
β Scheduled if high-risk β
β terraform apply in production β
β Verify and monitor β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β PR CHECKLIST FOR TERRAFORM: β
β β
β β terraform fmt applied β
β β terraform validate passes β
β β Plan output included in PR β
β β No secrets in code β
β β Cost estimate included (for new resources) β
β β Documentation updated β
β β Rollback plan documented β
β β
β PLAN OUTPUT IN PR: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β Plan: 3 to add, 1 to change, 0 to destroy. ββ
β β ββ
β β + aws_elasticache_cluster.main ββ
β β + aws_security_group.redis ββ
β β + aws_elasticache_subnet_group.main ββ
β β ~ aws_security_group.app (add egress rule) ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Kubernetes Work
K8s Task Structure
KUBERNETES CHANGE TASK:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β K8S DEPLOYMENT CHANGE: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β INFRA-200: Update API deployment configuration ββ
β β ββ
β β CHANGES: ββ
β β β’ Increase replicas: 3 β 5 ββ
β β β’ Update resource limits ββ
β β β’ Add readiness probe ββ
β β ββ
β β MANIFESTS AFFECTED: ββ
β β β’ k8s/api/deployment.yaml ββ
β β β’ k8s/api/hpa.yaml ββ
β β ββ
β β ROLLOUT STRATEGY: ββ
β β Rolling update (no downtime) ββ
β β ββ
β β TESTING: ββ
β β β Apply to staging namespace ββ
β β β Verify pods healthy ββ
β β β Load test ββ
β β β Apply to production namespace ββ
β β ββ
β β ROLLBACK: ββ
β β kubectl rollout undo deployment/api ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β HELM UPGRADE TASK: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β INFRA-210: Upgrade ingress-nginx to 4.8.0 ββ
β β ββ
β β Current: 4.6.0 ββ
β β Target: 4.8.0 ββ
β β ββ
β β CHANGELOG REVIEW: ββ
β β β’ Breaking changes: None ββ
β β β’ New features: Rate limiting improvements ββ
β β β’ Bug fixes: Memory leak fix ββ
β β ββ
β β RISK: Medium (ingress affects all traffic) ββ
β β ββ
β β TESTING: ββ
β β β Upgrade in staging ββ
β β β Test all ingress routes ββ
β β β Monitor for errors ββ
β β β Upgrade in production ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Change Management
Infrastructure Change Process
CHANGE MANAGEMENT:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β CHANGE ADVISORY BOARD (for high-risk): β
β β
β WEEKLY CAB MEETING: β
β Review upcoming high-risk infrastructure changes β
β β
β CAB REVIEW TASK: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β CAB-2024-05: VPC Migration Review ββ
β β ββ
β β Change: INFRA-150 (VPC migration) ββ
β β Requestor: @devops-lead ββ
β β ββ
β β REVIEW ITEMS: ββ
β β β Business justification ββ
β β β Technical approach ββ
β β β Risk assessment ββ
β β β Rollback plan ββ
β β β Communication plan ββ
β β β Testing completed ββ
β β ββ
β β DECISION: Approved / Needs changes / Rejected ββ
β β ββ
β β CONDITIONS: ββ
β β β’ Must complete during maintenance window ββ
β β β’ Database team on standby ββ
β β β’ Customer notification sent ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β MAINTENANCE WINDOWS: β
β β
β Standard window: Saturday 2-6 AM β
β Emergency window: As needed with approval β
β β
β COMMUNICATION: β
β β Status page scheduled maintenance β
β β Customer email (if significant) β
β β Internal Slack notification β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Documentation
Infrastructure Documentation
INFRASTRUCTURE DOCUMENTATION:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β DOCUMENTATION TASK: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β INFRA-DOC-01: Document Redis cluster setup ββ
β β ββ
β β Created after: INFRA-100 ββ
β β ββ
β β DOCUMENTATION: ββ
β β β Architecture diagram ββ
β β β Configuration details ββ
β β β Access and credentials ββ
β β β Monitoring and alerts ββ
β β β Runbook for common issues ββ
β β β Disaster recovery ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β RUNBOOK TEMPLATE: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β REDIS CLUSTER RUNBOOK ββ
β β ββ
β β COMMON ISSUES: ββ
β β ββ
β β High memory usage: ββ
β β 1. Check which keys consuming memory ββ
β β 2. Review TTL settings ββ
β β 3. Scale up if legitimate growth ββ
β β ββ
β β Connection failures: ββ
β β 1. Check security group rules ββ
β β 2. Verify app subnet can reach Redis subnet ββ
β β 3. Check Redis cluster status ββ
β β ββ
β β DISASTER RECOVERY: ββ
β β 1. Snapshots: Daily automatic ββ
β β 2. Restore: terraform apply with snapshot_id ββ
β β 3. RTO: 30 minutes ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β DEFINITION OF DONE FOR INFRA: β
β β Code reviewed and merged β
β β Applied successfully β
β β Monitoring configured β
β β Documentation updated β
β β Runbook created/updated β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ