SRE Weekly Issue #226

A message from our sponsor, StackHawk:

When a team introduces security bugs, they don’t know because nothing tells them. We test for everything else… why not security bugs?
https://www.stackhawk.com/blog/how-security-based-development-should-work?utm_source=SREWeekly

Articles

This is an article version of an interview with Dr. Danielle Ofri, author of a new book When We Do Harm, on NPR’s Fresh Air. I especially loved the part about near misses.

Bridget Bentz, Molly Seavy-Nesper, Deborah Franklin, Sam Briger, and Thea Chaloner — NPR

Maintenance of the logging system had unintended downstream effects including log loss and failure of the system that manages dynos.

In this incident, a TLS certificate was deployed without its intermediate, resulting in failures for some clients.

I wrote this after attending the Resilience Engienering Association’s webinar with panelists Dr. Richard Cook, John Allspaw, and Nora Jones, moderated by Laura Maguire. Once the recording is posted, I highly recommend watching!

Lex Neva

As SREs, we need to be laser focused on the user’s experience. Our SLIs should reflect that.

Emily Arnott — Blameless

This two-part series is an in-depth look at how Twitter adopted SRE, before SRE was even a thing.

Blameless

Outages

Updated: July 5, 2020 — 9:21 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme