SRE Weekly Issue #401

A message from our sponsor, FireHydrant:

Join FireHydrant Dec.14 for a conversation about on-call culture and its effect on engineering organizations, featuring special guests from Outreach and Udemy. Gain a better understanding of what makes excellent on-call culture and how to implement practices to improve yours.
https://app.livestorm.co/firehydrant/better-incidents-winter-bonfire-inside-on-call?type=detailed

Maybe you’re thinking of skipping over “yet another article about blamelessness”? Don’t. This one has some great examples and stories and is well worth a read.

  Michael Hart

I’m definitely guilty of a couple of these.

  Code Reliant

New podcast relevant to our interests!

In this series, you’ll hear insightful conversations with engineers, product managers, co-founders and more, all about the debatable topic of incident management.

  Luis Gonzalez — incident.io

A puzzling performance regression in EBS volumes, seemingly reproducible across instances. Anyone else seeing anything like this?

  Dustin Brown — dolthub

This article presents a framework for scaling SRE teams by defining SRE processes, automating, and iterating.

   Stelios Manioudakis — DZone

Some tips on what makes a good alert and how to design your alerts to be actually useful, rather than just noise.

  Leon Adato — Kentik

Why would you want multiple different targets for the same SLO? Read this one to find out.

  Alex Ewerlöf

Conflict-free Replicated Data Types are powerful, but they have downsides explained in this article, so it’d be great if we could avoid them when possible.

  Zak Knill

Updated: December 3, 2023 — 9:16 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme