We added Istio for observability. It added 15ms to every request.

The math:

  • Sidecar proxy: ~15ms per hop
  • Average user request: 50 inter-service calls
  • Total overhead: 750ms
  • Base latency: 200ms
  • New latency: 950ms

Why we added Istio:

  • Distributed tracing
  • mTLS everywhere
  • Traffic management
  • Circuit breakers
  • "Everyone uses it"

What we actually needed:

  • Tracing: OpenTelemetry (library, no proxy)
  • mTLS: Not in our threat model
  • Traffic management: Kubernetes services sufficient
  • Circuit breakers: Library-based (Resilience4j)

The alternative:

  • Removed Istio
  • Added OTel SDK directly
  • Library-based resilience patterns
  • Latency: 950ms → 200ms
  • Cluster resources: -40%

Lesson: Service mesh isn't free. Know the cost before adopting.


← Zurück zu Erfahrungsberichte