Node ran out of disk. All pods evicted. New pods scheduled on same node. Evicted again. Repeat forever.

The timeline:

  • 2:00 AM: Node disk reaches 90%
  • 2:05 AM: Kubelet starts evicting pods
  • 2:06 AM: Pods rescheduled to... same node (most available resources)
  • 2:07 AM: More logs, more disk usage
  • 2:08 AM: Evicted again
  • 2:09 AM: Alert fires (finally)

Root cause:

  • Application logging to container filesystem
  • No log rotation configured
  • ephemeral-storage limits not set
  • Node only tainted after disk pressure, but taints cleared when pods left

The fix:

resources:
                                    limits:
                                    ephemeral-storage: "2Gi"
                                    requests:
                                    ephemeral-storage: "1Gi"

Plus:

  • Log to stdout (collected by Fluentd)
  • Configure container log rotation
  • Monitor node disk usage with alerts at 70%

Lesson: Set ephemeral-storage limits. Always.


← Alınan Derslere Dön