It’s with great sadness that I note the passing of a giant in our field, Dr. Richard Cook. His memory will live on through his huge body of work and the countless ways he’s impacted our thinking and practice as SREs.
Articles
Here’s a wonderful tribute to the many ways Dr. Cook has advanced our field and others.
John Allspaw — Adaptive Capacity Labs
This seems like a fitting time to feature Dr. Cook’s seminal treatise here again.
Dr. Richard Cook
A good argument could be made either way, but what really caught my eye was this (emphasis mine):
Responding to incidents should distract as few people as reasonably possible. Organisations should be shooting for minimum viable participation, whilst still responding effectively, to allow them to retain focus.
Chris Evans — incident.io
Noticing a correlation between the adoption of SRE and cloud repatriation (moving apps out of the cloud), the author of this article asks, is there causation?
Lori Macvittie — Devops.com
I like the line this article draws between incident retrospectives and developing a PRR process, and also the emphasis on psychological safety.
Incidents reveal what your organization is good at and what needs improvement in your PRR processes.
Nora Jones — Jeli
Aperture is a new open source tool helps you prevent cascading failures using load-shedding and rate limiting.
BONUS CONTENT: Here‘s their article explaining how it works.
FluxNinja