SRE Weekly Issue #232

A message from our sponsor, StackHawk:

Is your company adopting GraphQL? Adding security testing is simple. Watch this 20 minute walk through to see how easy it is to get up and running!

https://www.youtube.com/watch?v=–liu7LCs5A

Articles

An engineer’s observation of a really effective Incident Command pattern.

Dean Wilson

Here’s Lorin Hochstein’s take on the STAMP (Systems-Theoretic Accident Model and Processes) workshop he attended recently.

Lorin Hochstein

What’s the difference between Resilience Engineering and High Reliability Organizations? This paper (and excellent summary) explains.

Torgeir Haavik, Stian Antonsen, Ragnar Rosness, and Andrew Hale (original paper)

Thai Wood — Resilience Roundup (summary)

This one focuses on what I feel are really important parts of SRE, taken from the article’s subheadings:

  • Vendor engineering
  • Product engineering
  • Sociotechnical systems engineering
  • Managing the portfolio of technical investments

Charity Majors — Honeycomb

Now that’s a for-serious incident report. Nice one, folks! This is an interesting case of theory-meets-reality for disaster planning.

giles — PythonAnywhere

Outages

Updated: August 23, 2020 — 8:49 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme