SRE Weekly Issue #432

A message from our sponsor, FireHydrant:

We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.

https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/

In this debugging story, an engineer wielded SystemTap to figure out why a Kafka broker was doing a ridiculous amount of reads.

  Terra Field — Honeycomb

  Full disclosure: Honeycomb is my employer.

A concise breakdown of the math involved in getting that extra nine of reliability.

It all boils down to creating the SLOs and requirements to keep your users happy, but nothing more. Unnecessary reliability is a high cost.

  Thomas Stringer

If you’re looking to advance in SRE, this article has some examples of the skills and experience you should aim for.

  Prabesh

Will Gallego shows us a way of thinking that helps turn “should haves” into deeper understanding of our sociotechnical systems.

  Will Gallego

Some words of wisdom I came across this week around startups choosing not to work on scalability too early.

   Vassil Popovski

Some commenters in this reddit thread are saying it’s easier to be called an SRE, but what does it mean? Some say SRE has gotten easier, and some say it’s gotten harder. What do you think?

  u/sreiously and others — reddit

The full report isn’t available yet (and may not ever be?) but this executive summary has a lot of juicy bits about the major 2022 Rogers internet and emergency service outage in Canada.

  Xona Partners, Inc.

The Rogers report executive summary includes some blamey and blame-adjacent language, and this analysis does a good job of calling it out and suggesting ways to recast it.

  Lorin Hochstein

The Rogers outage report executive summary indicates that truly out-of-band network management access may have made recovery easier. What exactly is involved in setting that up?

  Chris Siebenmann

Updated: July 7, 2024 — 7:56 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme