SRE Weekly Issue #421

View on sreweekly.com

Last week, I mistakenly attributed [an article](https://www.paigerduty.com/sre-biggest-problem/) to PagerDuty. Actually, it was by Paige Cruz, whose clever blog name I didn’t pay anywhere near close enough attention to! Thanks to several readers that nudged me gently about my error.

The problem with invariants is that they change over time

If you’ve been in this business long enough, you’ve almost certainly run into an incident where one of the contributors was an implicit invariant that was violated by a new change.

Easily the majority of incidents I’ve been in.

Lorin Hochstein

The TwinSLO Proposal

This article is about trying to solve for this problem:

a potentially significant number of customers or queries can be affected by an outage and this won’t trigger an SLO violation.

Niall Murphy

An Anonymous Complaint/Dr. Poston’s Response

A surgeon struggles with the difficulties in building a culture of retrospectives and introspection in their surgical team, by running a fascinating retro on himself in this blog post.

Robert Poston, MD

Incidents and the requirement of slowing down

An argument for buying yourself time to slow down and make decisions carefully, as a way of ultimately speeding up incident resolution.

Shayon Mukherjee

Build your own role-playing game: the business continuity plan drill

Disasters threatening a business’ ability to operate core functions don’t occur that often (phew!), but we do want to ensure we are prepared to keep our business running if they do. To practice disaster response skills, we run business continuity drills, and you can too with our 10-step plan!

Janna Brummel — WeTransfer

Availability Archetypes

How people think about reliability varies between companies. Which of the four different perspectives laid out int his article does your company fit into, if any?

Ross Brodbeck

eu1 ingest and UI down

Honeycomb posted this followup on their April 9 outage, explaining what went wrong and how they’re responding.

Honeycomb

Full disclosure: Honeycomb is my employer.

For an SRE, relationships and communication matter most: advice from SRE’s

The author of this article posed a question on r/sre:

What matters most for your success as an SRE?

They share a summary of the answers they got, with their commentary.

Nočnica Mellifera — Checkly

SRE Weekly Issue #421

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, FireHydrant:

Subscribe

RSS

Mastodon

Search Issues