SRE Weekly Issue #388

Articles

Operating effectively in high surprise mode

This article makes a cool analogy between designing systems to operate well under unexpected load and designing socio-technical systems that operate well when the people are surprised by what the system is doing.

Lorin Hochstein

10 service level agreement practices you should implement

If you need to create SLAs, this article has some solid advice on how to go about it — and what to avoid.

incident.io

Prometheus scrape failures can cause alerts to be ‘resolved’

If Prometheus can’t scrape your service, an alert can get resolved incorrectly — and that can happen exactly when your service is failing!

Chris Siebenmann

A Spectrum of Actions

A really nifty three-part exploration of action items in the aftermath of an incidents. Rather than consider cost/benefit, this article series proposes that we think about the likelihood of an action item being completed.

J. Paul Reed

Is Northern Virginia Really the Least Reliable AWS Region And Why?

Yes, as it turns out — and these folks have the receipts (along with some theories as to why).

Colin Bartlett

Reader: Insight and Incidents

The “wow” moment in this article is under the heading, “What can we learn from creative desperation?”

Eric Dobbs — Learning From Incidents

How to create automated paging and on-call at your startup

Before explaining how they set up their on-call, these folks share why they avoided it in the early stages of their startup, and what made them finally take the plunge.

Dustin Brown — DoltHub

The Dark Side of SRE

For the good of the profession, the SRE community still needs to coalesce around more consistent job ladders, expectations, and competencies.

Code Reliant

Incident Review: What Comes Up Must First Go Down

Honeycomb had their worst incident ever at the end of July, and in their characteristic style, they’ve posted an incredibly detailed analysis of what happened — and that’s just the blog post. Then you can click through for a 17-page PDF with lots more detail.

Fred Hebert — Honeycomb
Full disclosure: Honeycomb is my employer.

SRE Weekly Issue #388

Articles

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, Rootly:

Articles

Subscribe

RSS

Mastodon

Search Issues