SRE Weekly Issue #251

Happy new year!

A message from our sponsor, StackHawk:

Still looking for a good new years resolution? How about adding application security testing to your CI/CD pipeline with StackHawk. Get started with our free account.
https://sthwk.com/freeplan

Articles

Tips and tricks for writing effective runbook documentation when you aren’t a technical writer

I like the discussion of the “Curse of Knowledge” cognitive bias.

Taylor Barnett — Transposit

Here’s one engineer’s SLO journey.

My main focus is on how I educated myself about SLOs and how applied this to my organization.

Ioannis Georgoulas

This blog is a redacted internal memo that aimed to familiarize SLOs with its audience, explain the value of an SLO culture, and describe how we would implement and roll them out.

Thomas Césaré-Herriau — Brex

Why would you do this? It’s all about Conway’s Law.

Ben Nadel

The folks at Adaptive Capacity Labs have seen a few patterns crop up over and over in their post-incident reviews. How many of these have you seen before?

John Allspaw — Adaptive Capacity Labs

Lots of complex contributing factors led to the main character being left behind in the movie Home Alone… so let’s treat it like a production incident!

Fred Hebert

This one includes a complex timeline showing the interplay of two pairs of bugs, where one in each pair masked the other.

Lorin Hochstein

Outages

Updated: January 3, 2021 — 8:11 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme