SRE Weekly Issue #270

A message from our sponsor, StackHawk:

APIs are not only the backbone of modern application architecture, but they are also a key part of security. Discover what API security testing is, how it works, and get started using API security tools
http://sthwk.com/API-security

Articles

This is an in-progress document about the kinds of patterns we see or use when designing systems. The author warned me that it’s a work in progress and maybe not ready for prime-time, but I think this is exactly the time when I should get it in front of your eyes.

I’d love your help growing this list. If you know of a name that is missing from the list please send me a tweet with the name and a short description of it and I’ll include it in the list with a link to your tweet

Mads Hartmann

Whoa, a podcast dedicated to picking apart public incident postings! I love this, because there’s a lot that’s left to shorthand, and a live conversation is a great way to flesh it out.

Tom Kleinpeter and Jamie Turner

There’s a really interesting undercurrent in this story about resilience. Nurses can catch these kinds of errors, but this just one layered protection among many. If the system is reduced to relying on that second-layer defense, the overall resilience is diminished.

Daniel Keane — ABC News

Of course, before reaching this stage, all of the pieces are tested in isolation. But until they’re all put together, it’s almost impossible to predict the behavior of the finished product during an accident.

Mikolaj Pawlikowski

The attributes discussed are:

  • Problem solving
  • Awareness building
  • Collaboration
  • Empathy

Jayne Groll

Wait, more attributes? Oh, and by the same author, too:

  • “Great SREs have a passion for high-quality automation.”
  • “A great SRE ensures SLOs (Service Level Objectives) are set at correct boundaries of service; […]”
  • Prize Communication.
  • Look for longer-term support experience.
  • Look for a person that demonstrates empathy.

Jayne Groll

This one explore the application of SRE principles to mobile app design.

Abhijith Krishnappa

This two-part series uses a narrative case study format to show how SLOs can be misleading. You might have great numbers, but what are the numbers actually measuring?

Adam Hammond — Squadcast

Outages

  • A major US oil pipeline
    • The pipeline was targeted by a ransomware attack.
  • GasBuddy
    • This app for finding gasoline prices seems to have been impacted by a flood of user traffic driven by the US oil pipeline outage. In fact, their front page seems to be very slow for me as I write this.
  • Salesforce
    • The outage was widespread and even affected their status page.
  • eBay
  • Microsoft Outlook
Updated: May 16, 2021 — 8:55 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme