SRE Weekly Issue #507

The Real State of Helm Chart Reliability (2025): Hidden Risks in 100+ Open‑Source Charts

There’s a lot you can get out of this one even if you don’t happen to be using one of the helm charts they evaluated. Their evaluation criteria are useful and easy to apply to other charts — and also a great study guide for those new to kubernetes.

Prequel

The dangers of SSL certificates

This is the best explanation I’ve seen yet of exactly why SSL certificates are so difficult to get right in production.

Lorin Hochstein

What Other Industries Can Teach Tech About Crisis Simulation

An article on the importance of incident simulation for training, drawing from external experience in using simulations.

Stuart Rimell — Uptime Labs

People cannot “just pay attention” to (boring, routine) things

I especially like the discussion of checklists, since they are often touted as a solution to the attention problem.

Chris Siebenmann

No record left behind: How WarpStream can withstand cloud provider regional outages

This is a new product/feature announcement, but it also has a ton of detail on their implementation, and it’s really neat to see how they built cloud provider region failure tolerance into WarpStream.

Dani Torramilans — WarpStream

The Reliability-Cost Inversion Law: Why Reliability Gets Cheaper at Scale

It’s interesting to think of money spent on improving reliability as offsetting the cost of responding to incidents. It’s not one-to-one, but there’s an argument to be made here.

Florian Hoeppner

Quiet Influence: A Guide to Nemawashi in Engineering

An explanation of the Nemawashi principle for driving buy-in for your initiatives. This is not specifically SRE-targeted, but we so often find ourselves seeking buy-in for our reliability initiatives.

Matt Hodgkins

Customer Experience: The Reliability metric that matters

The next time you’re flooded with alerts, ask yourself: Does this metric reflect customer pain, or is it just noise? The answer could change how you approach reliability forever.

Spiros Economakis

SRE Weekly Issue #507

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, incident.io:

Subscribe

RSS

Mastodon

Search Issues