SRE Weekly Issue #521

In incidents, swarming is a feature, not a bug

Spontaneous swarming of responders might seem like a nuisance that breaks our tidy mental models of incident response, but it’s actually very powerful. It’s something to facilitate and encourage, not simply tolerate.

Brent Chapman

Exactly Once Processing: Myth vs Reality

The misconception is that the local assurances automatically combine to form a single end-to-end promise that spans brokers, processors, databases, outboxes, caches, webhooks, and external APIs.

Irullappan irulandi — DZone

How we reduced core unit boot time from hours to minutes

When a firmware issue caused reboots for firmware upgrades to take four hours(!), they had to find a solution.

Giovanni Pereira Zantedeschi, Nnamdi Ajah, and Omar Sheik-Omar — Cloudflare

AI enthusiasts are in a race against time, AI skeptics are in a race against entropy

This one strikes a balance on AI that really speaks to me.

If you’re the one left holding the bag, you should generally get final say over what goes in that bag.

Charity Majors

Sitar-agent: Building a reliable dynamic configuration sidecar at scale

How Airbnb built a Kubernetes sidecar to deliver dynamic configuration reliably at scale.

Bo Teng — Airbnb

When failover isn’t safe: Building high-availability PostgreSQL on Kubernetes

In this post, we’ll walk through how we redesigned our Kubernetes-based PostgreSQL clusters for failover safety, how we balanced durability against latency, and what we learned while validating this approach through benchmarking and failure testing.

Shree Sampath — Datadog

When Claude changed, everything changed: Managing AI blast radius in production

The failure mode on this one is really interesting, and the bit about “infinite blast radius” caught my eye.

Sarat Mahavratayajula ,Vijay Sagar Gullapalli — VentureBeat

Why we need resilient software design – Part 2

I’m enjoying this series so far, and I’m looking forward to reading the rest. It’s worth starting at part 1, but part 2 can stand on its own in a pinch.

Uwe Friedrichsen

SRE Weekly Issue #521

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, Bronto:

Subscribe

RSS

Mastodon

Search Issues