SRE Weekly Issue #344

Articles

In this story of SLOs gone bad, error budgets and code freezes provided a perverse incentive that caused a great deal of harm.

dobbse.net

Taking a Page From Site Reliability Engineering

This article seeks to apply SRE principles to security in the form of a Threat Budget.

Jason Bloomberg — Intellyx

How an incident management tool helps you conquer response challenges

After talking to hundreds of engineers about their processes, we’ve identified five of the most common challenges we see across companies looking to put more structure behind how they manage their incidents.

Mike Lacsamana — FireHydrant

Incident Review: Shepherd Cache Delays

The Analysis section has a lot of important lessons. What really stands out in this incident review is the fact that Honeycomb plainly lays out the fact that they don’t yet know what went wrong, and why not.

Fred Hebert — Honeycomb
Full disclosure: Honeycomb is my employer.

Staging is a trap

several, small staging clusters—each fit for their purpose—offers a more maintainable, cheaper alternative.

Tyler Cipriana

The Case of the Missing Fuel: The story of the Stockport air disaster

I’m really enjoying the Admiral Cloudberg series of aircraft accident investigation reports. How did I not know about these before??

A lot has improved in aviation safety since this crash in 1967, but there’s still a lot we can learn in SRE even now. For example: the operator’s view into the system should make the result of their inputs clear.

Admiral Cloudberg

How We Found Azure’s Unannounced Breaking Change

An unannounced (maybe inadvertent?) breaking change in an Azure API caused an outage. Here’s the story of the investigation.

Nikko Campbell — Metrist

Value for Money: The crash of ValuJet flight 592

Another Admiral Cloudberg air accident investigation, this time showing how easily critical details can slip through the cracks.

Admiral Cloudberg

SRE Weekly Issue #344

Articles

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, Rootly:

Articles

Subscribe

RSS

Mastodon

Search Issues