Service Mesh Overhead
We added Istio for observability. It added 15ms to every request.
The math:
- Sidecar proxy: ~15ms per hop
- Average user request: 50 inter-service calls
- Total overhead: 750ms
- Base latency: 200ms
- New latency: 950ms
Why we added Istio:
- Distributed tracing
- mTLS everywhere
- Traffic management
- Circuit breakers
- "Everyone uses it"
What we actually needed:
- Tracing: OpenTelemetry (library, no proxy)
- mTLS: Not in our threat model
- Traffic management: Kubernetes services sufficient
- Circuit breakers: Library-based (Resilience4j)
The alternative:
- Removed Istio
- Added OTel SDK directly
- Library-based resilience patterns
- Latency: 950ms → 200ms
- Cluster resources: -40%
Lesson: Service mesh isn't free. Know the cost before adopting.