Articles
This one advocates for looking beyond “root cause” when analyzing an incident, and instead finding Themes and Takeaways.
If it can be solved with a pull request itโs not a takeaway.
Vanessa Huerta Granda โ Jeli
In this juicy incident, the Incident Commander’s intimate knowledge of a similar failure mode fixated incident response away from the true cause.
Fred Hebert โ Honeycomb
[…] the more we normalize lower-impact incidents, the more confidence and experience we build for Sev1 situations.
Dan Condomitti โ The New Stack
Want to compensate folks extra for on-call work? This tool connects to PagerDuty to do all the heavy lifting for you.
Lawrence Jones โ incident.io
This Reddit post in r/sre has some really great stories in the comments.
various users โ Reddit
Along with the “why”, this article also goes into the “how”.
Martha Lambert โ incident.io
Early in my career, I had to write a raw IP packet generator to reproduce a DoS attack so that I could mitigate it. It’s fun!
Julia Evans
In an incident in July, a cloud provider change broke provisioning for new Codespaces VMs, taking down the service.
Jakub Oleksy โ GitHub
Put Safety First and Minimize
the 12 Common Causes of Mistakes
in the Aviation Workplace
FAA (US’s Federal Aviation Administration)