SRE Weekly Issue #428

A message from our sponsor, FireHydrant:

We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.

https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/

This article presents in incident theme that I’ve lived through many times but never had such a pithy name for.

  Geoff Townsend — Blameless

There are risks and downsides inherent in a distributed system, so it’s worth thinking about whether you really need one.

  Pipitz — Adevinta

And here’s a counterpoint to the previous article: deciding whether you need a distributed system isn’t just about scale.

  Marc Brooker

The effectiveness of memes in availability campaigns.

This short post is a pile of memes, and the video one is top notch.

  Ross Brodbeck

Paraphrasing part of this article: either you didn’t understand your system fully when you wrote the alert, or there really are sporadic failures.

  Chris Siebenmann

If you’ve ever created an action item from an incident along the lines of “don’t take unnecessary risks in the future”, you need to read this one.

The rest of you need to read it too.

  Lorin Hochstein

A how-to for building anomaly detection alerting in Prometheus with specific config examples.

  Karl Stoney

A panicked engineer asks reddit’s r/sre about an incident they caused: how could they have done better? Will they be fired? The comments are spot on, and this conversation is fresh enough that you could jump in too if you’re interested.

  u/console_fulcrum and others — reddit

Last Monday, Honeycomb had an outaged related to a schema migration involving MySQL’s ENUM data type, and they posted this incident report.

Bonus content: I wasn’t aware of ENUMs at all, so I had to brush up with this article: 8 Reasons Why MySQL’s ENUM Data Type Is Evil.

  Honeycomb

  Full disclosure: Honeycomb is my employer.

An experienced SRE discusses the skills and experiences you might be quizzed about in an interview for an SRE role.

  Krishna Vinnakota — DZone

Updated: June 9, 2024 — 9:40 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme