Added L1, L2, and L3 caches for "better performance." Now we have 3 places where data can be wrong instead of 1.

Our brilliant architecture:

  • L1: In-process cache (Guava)
  • L2: Local Redis
  • L3: Distributed Redis cluster
  • Source of truth: PostgreSQL

What went wrong:

  • L3 cache invalidated correctly
  • L2 cache still had old data (different TTL)
  • L1 cache on server A had old data
  • L1 cache on server B had new data
  • Users saw different data on each refresh

Debugging hell:

"Which cache layer has the bad data?" became a 2-hour investigation every time.

What we learned:

  • Each cache layer multiplies invalidation complexity
  • TTLs must be coordinated (inner cache < outer cache)
  • Need observability into every layer
  • Sometimes one cache is enough

Lesson: More cache layers ≠ better performance. It equals more places to debug when things go wrong.


← Назад към Научени Уроци