SRE Weekly Issue #467

A message from our sponsor, incident.io:

SEV0 is back. This fall, we’re bringing together the best minds in incident management for a day of learning, sharing, and networking in San Francisco and London. RSVP now—tickets are complimentary.

https://go.incident.io/SEV0-2025

It’s been awhile since we’ve seen any updates from the LFI folks, but here’s a brand new home for the community. I’ve bought my membership.

I like this article’s measured approach to anomaly detection and other AIOps features. Will it work? With your data?

  Jacek Migdal — Quesma

A structured approach to system design includes defining the problem, scope, tenets, risks, assumptions, and architecture choices.

I like how this article follows the process it lays out by writing an example design for a distributed search engine.

  Nikunj Agarwal — DZone

A mental model to detect and prevent optimizing the wrong thing, at the wrong time, or for the wrong reasons

This is the first time I’ve seen premature optimization dissected in this way, and I really like this model.

  Alex Ewerlöf

My favorite part of this podcast episode is the discussion of the unintended consequences of automation and “humans-are-better-at/machines-are-better-at” oversimplification. The transcript is great in case you’re not able to listen.

  Shane Hastie, with guest Courtney Nash — InfoQ

What role is an AI tool going to play in your sociotechnical system? This article gives you 12 insightful questions that will help guide your approach.

  Fred Hebert — Honeycomb

As long as there’s at least one HDD ‘tape’ filesystem mounted, you can count them, but once there are none, the result of counting them is not 0 but nothing.

And “nothing” doesn’t cause an alert. Oops!

  Chris Siebenmann

Updated: March 9, 2025 — 9:56 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme