SRE Weekly Issue #515

A message from our sponsor, atscaleconference.com:

Building scalable, high-performance infrastructure for AI is one of today’s toughest challenges. Join @Scale: Systems & Reliability on June 25 in Bellevue, WA to learn how leading engineers are solving it.

Secure your seat today!

Why Reliability Metrics Age Faster Than the Systems They Measure

Is your dashboard always green because everything is working, or because your metrics are lying?

  Barnadeep Bhowmik — Stackademic

But when we rolled out the new query, disk writes doubled and Write-Ahead Logging (WAL) syncs quadrupled. We discovered that even when an upsert doesn’t change any values, it still locks the conflicting row, which is recorded in the WAL.

Yikes! Click through to learn how they figured it out and what they did about it.

  Anthonin Bonnefoy — Datadog

it’s important not just to try to prevent incidents but to be fully ready for them when they inevitably happen anyway.

  Joe Mckevitt — Uptime Labs

Queues absorb spikes but not sustained overload. Without backpressure, limits, and monitoring, backlogs grow until systems fail.

   David Iyanu Jonathan — DZone

Oof. The code exhausted all ephemeral ports and then they logged itself to death complaining about it. I love the workaround. Loopback is a /8!

  Jim Calabro — Bluesky

…and here’s an awesome analysis and explanation of the Bluesky writeup. I’ve definitely been down the path of scratching my head about EADDRINUSE before.

  Lorin Hochstein

AI didn’t solve the problem for them, but it sped up the grunt-work and significantly reduced their iteration time, letting them get to an answer much faster.

  Tristan Streichenberger — Mixpanel

It’s interesting to me that this is essentially an outage/degradation report, but the definition of system degradation for an LLM tool is much more subjective than with traditional services. The “ablation testing” concept is really neat.

  Anthropic

Updated: May 3, 2026 — 9:55 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme