Articles
This article hints at the fact that blame and sanction (punishment) are two different things.
Bonus content: Dr. Richard Cook on blameless vs sanctionless retrospectives
Bob Reselman
here we have a few lessons in operations that we all (eventually) (have to) learn; often the hard way.
Jan Schaumann
I especially like the emphasis on reducing pager fatigue through thoughtfully selected SLOs.
Emily Arnott — Blameless
The four concepts, drawn from a paper by Dr. David Woods, are:
- Rebound
- Robustness
- Graceful extensibility
- Sustained adaptability
Thai Wood — Resilience Roundup
Understanding the difference between work-as-imagined and work-as-done is critical to the reliability of a complex system.
Jaime Woo and Emil Stolarsky — The Morning Mind-Meld
There’s a useful survey in here if you’re trying to measure or track toil in your organization.
Eric Harvieux — Google
A nice little debugging story hinging on a bug in an upstream library.
Sanket Patel
Outages
- Microsoft Office 365 Sharepoint Online
- TD Bank
- Google Drive, Docs, Sheets, and Slides
- Facebook and Instagram
- Gandi
- They posted a quite candid analysis, concluding that they’re not sure what went wrong.