SRE Weekly Issue #389

A message from our sponsor, Rootly:

When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly’s latest blog post:


Here’s four of the lessons I learned that should help you build a successful SRE organization.

  1. Focus on Developer Training
  2. Focus on the Right Abstractions
  3. Focus on Self Service
  4. Automate Yourself out of a job

  Sven Hans Knecht

In this blog post, we’ll talk about two incident management structure models — distributed and centralized, including the pros and cons of each, and examples of what each structure looks like in our community.

  Robert Ross — FireHydrant

The Rasmussen model conceptualizes the limits of a system along 3 boundaries: Cost, System Performance, and Human Capacity.

  Nishant Modak — Last9

Wow, this is a really interesting incident. it has all the hallmarks of a nightmare sev1: time pressure, unknown problem, inventing new procedures on the spot, multiple different teams/specialties having to work together, etc.

  Jorg Wenninger — CERN

What do you do when many engineers all need to take the same day off each week for religious reasons?


Toyota recently halted production in their factories due to a problem in their order system, about which they shared some interesting details.


Here’s a guidebook on how to handle being the first SRE at a company.

  Sven Hans Knecht

Updated: September 10, 2023 — 10:21 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme