SRE Weekly Issue #312

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt):


There’s a really great discussion of “pilot error” at the end of this air accident summary video.

  Mentour Pilot

There are some really great names and talks on the agenda for this half-day virtual conference on April 1.


This article is about building a framework, rather than using one off-the-shelf, to ensure that it’s tailored to the needs of your orgnaization.

  Ethan Motion

When are you smarter than your playbooks, and when are your playbooks smarter than you?

  Andre King — Rootly
This article is published by my sponsor, Rootly, but their sponsorship did not influence its inclusion in this issue.

This one is about piecing together the story of how an incident unfolded. One interviewee might mention something new, and then you can ask later interviewees about it.

  Cory Watson — Jeli

All about alert fatigue: how to recognize it and how to fix it once you notice it.

  Emily Arnott — Blameless

This one includes a summary of their February 2 outage:

[…] a routine deployment failed to generate the complete set of integrity hashes needed for Subresource Integrity. The resulting output was missing values needed to securely serve Javascript assets on

  Jakub Oleksy — GitHub

Following on last week’s article about the term “postmortem”, this one has even more great reasons to pick a different word.


This article recommends a two-stage approach to writing an incident retrospective report: a “calibration document” and then the final report.

  Thai Wood — Jeli


  • Tasmania
  • Discord
    • Something’s on fire! We’re looking into it, hang tight.

Updated: March 6, 2022 — 8:53 pm
SRE WEEKLY © 2015 Frontier Theme