We had 3 replicas across 3 nodes. We thought we were highly available. We weren't.

The setup:

  • 3 replicas of each service
  • 3 Kubernetes nodes
  • Pod anti-affinity: spread across nodes ✅
  • All nodes: us-east-1a ❌

The incident:

  • AWS us-east-1a partial outage
  • All 3 nodes affected
  • All pods evicted
  • No capacity in the zone
  • Complete service outage

Why it happened:

  • Cheaper instances in us-east-1a
  • Auto-provisioner defaulted to single AZ
  • Nobody checked node distribution

The fix:

topologySpreadConstraints:
                                    - maxSkew: 1
                                    topologyKey: topology.kubernetes.io/zone
                                    whenUnsatisfiable: DoNotSchedule

Lesson: Replicas without zone distribution is not high availability.


← Alınan Derslere Dön