An engineer’s observation of a really effective Incident Command pattern.
Here’s Lorin Hochstein’s take on the STAMP (Systems-Theoretic Accident Model and Processes) workshop he attended recently.
What’s the difference between Resilience Engineering and High Reliability Organizations? This paper (and excellent summary) explains.
Torgeir Haavik, Stian Antonsen, Ragnar Rosness, and Andrew Hale (original paper)
Thai Wood — Resilience Roundup (summary)
This one focuses on what I feel are really important parts of SRE, taken from the article’s subheadings:
- Vendor engineering
- Product engineering
- Sociotechnical systems engineering
- Managing the portfolio of technical investments
Charity Majors — Honeycomb
Now that’s a for-serious incident report. Nice one, folks! This is an interesting case of theory-meets-reality for disaster planning.
giles — PythonAnywhere