This is an article version of an interview with Dr. Danielle Ofri, author of a new book When We Do Harm, on NPR’s Fresh Air. I especially loved the part about near misses.
Bridget Bentz, Molly Seavy-Nesper, Deborah Franklin, Sam Briger, and Thea Chaloner — NPR
Maintenance of the logging system had unintended downstream effects including log loss and failure of the system that manages dynos.
In this incident, a TLS certificate was deployed without its intermediate, resulting in failures for some clients.
I wrote this after attending the Resilience Engienering Association’s webinar with panelists Dr. Richard Cook, John Allspaw, and Nora Jones, moderated by Laura Maguire. Once the recording is posted, I highly recommend watching!
As SREs, we need to be laser focused on the user’s experience. Our SLIs should reflect that.
Emily Arnott — Blameless
This two-part series is an in-depth look at how Twitter adopted SRE, before SRE was even a thing.