At first, getting internal buy-in for SRE efforts can be difficult. “Build the Resilient Future Faster: Creating a Culture of Reliability” shows you exactly how–and why–we created and implemented our own culture of DevOps and SRE:


Read about their transition from multi-cloud to all AWS and how they scaled to 10x the login throughput.

Dirceu Tiegs — Auth0

This article on the emergent behavior of algorithms is well worth thinking about as an SRE. Even without machine learning, our infrastructures have complex emergent behaviors, as you can read in any incident retrospective.

Andrew Smith — The Guardian

This interesting pitfall of chaos engineering stood out to me:

[…] if you hand a team 50 vulnerabilities, they’re probably not going to fix any of them. You know what I mean? So you have to figure out a way to prioritize those …

Andrea Echstenkamper with Nora Jones (Netflix), Ted Strzalkowski (LInkedIn), and Pat Higgins (Gremlin)

Well worth a quick listen (2 minutes 30 seconds).

Todd Conklin — Pre-Accident Podcast

In this series, we’ll dig into different types of observability tools. For each type, we’ll cover what they’re used for, what specific tools are available, some use cases, and any unique characteristics that may come up during your search for a new tool.

Linked above is an introduction to the article series. The first in the series is also out, focusing on time-series metric systems.

Dan Barker


