SRE Weekly Issue #493

A message from our sponsor, Shipfox:

Shipfox supercharges GitHub Actions – no workflow changes, 30-min setup.

  • 2x faster builds with better CPU, faster disks & high-throughput caching
  • 75% lower costs with shorter jobs and better price-per-performance
  • Full CI observability with test/job speed and reliability

👉 See how it works: https://shipfox.io?utm_source=SREWeekly&utm_campaign=issue493

I like how this goes deep on the ways proxies can manage many connections at once, like SO_REUSEPORT.

  Mitendra Mahto

Here I want to talk about two classes of problems where accountability is a poor solution to addressing the problem, where the OceanGate accident falls into the second class.

  Lorin Hochstein

This one has so many lessons we can learn from that it might as well be about IT infrastructure.

Research from high-reliability organizations reveals that individual errors are almost always symptoms of deeper systemic problems.

   Muhammad Abdullah Khan — KevinMD.com

But Vibe Coding introduces real risks, particularly around resilience, that are worth examining before we place too much faith in it.

I really appreciate the way the author methodically lays out their points, including through the concept of competitive and complementary artifacts.

  Stuart Rimell — Uptime Labs

I really enjoyed the part about the Google interview question. That one’s going to have me thinking for awhile.

  Jos Visser

Here’s a great overview of why time and ordering are so important (and difficult) in distributed systems.

  Sid — The Scalable Thread

I really like the way the author teases apart the true, practical meaning of “eventual consistency”. The example of Amazon shopping carts is especially illuminating.

  Uwe Friedrichsen

Updated: September 7, 2025 — 9:59 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme