SRE Weekly Issue #511

A message from our sponsor, Depot:

CI was designed for humans who context-switch while waiting. Agents don’t. They’re just blocked. Depot CEO Kyle Galbraith on how they re-imagined Depot CI to close the loop: run against local patches, rerun a single job, SSH into the runner to check reality. Per-second billing, no minimums.

Run depot ci migrate

This one’s definitely going to be good to keep in mind during my next incident.

FYI for folks with no or low vision, there’s a screenshot of J. Paul Reed quoting Vanessa Huerta Granda: “Incidents are where engineers are made”.

  Stuart Rimell — Uptime Labs

Etsy migrated a 1,000-table DB with 1,000 shards (with their own custom ORM!) over to vitess, and it took some care, especially in how they handled transactions.

  Ella Yarmo-Gray — Etsy

Wow, this one sure hits hard.

  Kenneth Eversole

The section on lessons learned toward the end of this debugging story is a goldmine.

  Lokesh Soni

How do you ensure reliability in a system you can’t access? How can you monitor SLIs/SLOs without metrics?

  Alex Ewerlöf

I love a good debugging story, and this one delivers, with a confluence of gnarly problems and lessons we can all learn from.

  James Sawyer — Phantom Tide

Oof, what a nasty little gotcha in the API call at the heart of this incident.

  David Tuber and Dzevad Trumic — Cloudflare

Lorin’s Law strikes again!

System intended to improve reliability contributed to incident

  Lorin Hochstein

Updated: April 5, 2026 — 10:15 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme