This one advocates for looking beyond “root cause” when analyzing an incident, and instead finding Themes and Takeaways.

If it can be solved with a pull request itโ€™s not a takeaway.

  Vanessa Huerta Granda โ€” Jeli

In this juicy incident, the Incident Commander’s intimate knowledge of a similar failure mode fixated incident response away from the true cause.

  Fred Hebert โ€” Honeycomb

[…] the more we normalize lower-impact incidents, the more confidence and experience we build for Sev1 situations.

  Dan Condomitti โ€” The New Stack

Want to compensate folks extra for on-call work? This tool connects to PagerDuty to do all the heavy lifting for you.

  Lawrence Jones โ€”

This Reddit post in r/sre has some really great stories in the comments.

  various users โ€” Reddit

Along with the “why”, this article also goes into the “how”.

  Martha Lambert โ€”

Early in my career, I had to write a raw IP packet generator to reproduce a DoS attack so that I could mitigate it. It’s fun!

  Julia Evans

In an incident in July, a cloud provider change broke provisioning for new Codespaces VMs, taking down the service.

  Jakub Oleksy โ€” GitHub

Put Safety First and Minimize
the 12 Common Causes of Mistakes
in the Aviation Workplace

  FAA (US’s Federal Aviation Administration)

