SRE Weekly Issue #339

It’s with great sadness that I note the passing of a giant in our field, Dr. Richard Cook. His memory will live on through his huge body of work and the countless ways he’s impacted our thinking and practice as SREs.

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒.

Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:


Here’s a wonderful tribute to the many ways Dr. Cook has advanced our field and others.

  John Allspaw — Adaptive Capacity Labs

This seems like a fitting time to feature Dr. Cook’s seminal treatise here again.

  Dr. Richard Cook

A good argument could be made either way, but what really caught my eye was this (emphasis mine):

Responding to incidents should distract as few people as reasonably possible. Organisations should be shooting for minimum viable participation, whilst still responding effectively, to allow them to retain focus.

  Chris Evans —

Noticing a correlation between the adoption of SRE and cloud repatriation (moving apps out of the cloud), the author of this article asks, is there causation?

  Lori Macvittie —

I like the line this article draws between incident retrospectives and developing a PRR process, and also the emphasis on psychological safety.

Incidents reveal what your organization is good at and what needs improvement in your PRR processes.

  Nora Jones — Jeli

Aperture is a new open source tool helps you prevent cascading failures using load-shedding and rate limiting.

BONUS CONTENT: Here‘s their article explaining how it works.


Updated: September 18, 2022 — 8:51 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme