Articles
This is an in-progress document about the kinds of patterns we see or use when designing systems. The author warned me that it’s a work in progress and maybe not ready for prime-time, but I think this is exactly the time when I should get it in front of your eyes.
I’d love your help growing this list. If you know of a name that is missing from the list please send me a tweet with the name and a short description of it and I’ll include it in the list with a link to your tweet
Mads Hartmann
Whoa, a podcast dedicated to picking apart public incident postings! I love this, because there’s a lot that’s left to shorthand, and a live conversation is a great way to flesh it out.
Tom Kleinpeter and Jamie Turner
There’s a really interesting undercurrent in this story about resilience. Nurses can catch these kinds of errors, but this just one layered protection among many. If the system is reduced to relying on that second-layer defense, the overall resilience is diminished.
Daniel Keane — ABC News
Of course, before reaching this stage, all of the pieces are tested in isolation. But until they’re all put together, it’s almost impossible to predict the behavior of the finished product during an accident.
Mikolaj Pawlikowski
The attributes discussed are:
- Problem solving
- Awareness building
- Collaboration
- Empathy
Jayne Groll
Wait, more attributes? Oh, and by the same author, too:
- “Great SREs have a passion for high-quality automation.”
- “A great SRE ensures SLOs (Service Level Objectives) are set at correct boundaries of service; […]”
- Prize Communication.
- Look for longer-term support experience.
- Look for a person that demonstrates empathy.
Jayne Groll
This one explore the application of SRE principles to mobile app design.
Abhijith Krishnappa
This two-part series uses a narrative case study format to show how SLOs can be misleading. You might have great numbers, but what are the numbers actually measuring?
Adam Hammond — Squadcast
Outages
- A major US oil pipeline
- The pipeline was targeted by a ransomware attack.
- GasBuddy
- This app for finding gasoline prices seems to have been impacted by a flood of user traffic driven by the US oil pipeline outage. In fact, their front page seems to be very slow for me as I write this.
- Salesforce
- The outage was widespread and even affected their status page.
- eBay
- Microsoft Outlook