SRE Weekly Issue #468

A message from our sponsor, incident.io:

MTTx metrics fall short—learn the new industry benchmarks for measuring and improving incident management. Join us on Tuesday, March 18th to discover data-driven insights from 100K+ incidents and practical steps to enhance your response strategy.

https://go.incident.io/registration.goldcast.io/webinar/going-beyond-mttx-measuring-what-good-incident-management-looks-like

No matter how bullet-proof you build the components of your system, the only way to make nines go up is to be ready to deal with the host of surprises that take them back down.

  Clint Byrum

Here’s an example of a really great application of bloom filters, in which speed is key and a slight risk of false is acceptable.

  Alex Gardiner — Klaviyo

This fun video gives us a small glimpse into the world of traffic light controllers, and more importantly, what makes them reliable. There’s also a longer video that goes deeper into why a Raspberry Pi isn’t up to the job.

  Traffic Light Doctor

Here’s an overview of several options to scale Prometheus beyond a single instance, including a handy table of features and functionality.

  Gaurav Maheshwari

A nice guide for using incident analysis in your home lab setup, plus a write-up for an incident experienced by the author.

  Barush Mendez

A highly detailed explanation of Paxos with diagrams and a model in FizzBee.

  Lorin Hochstein

I’ve boiled my frustration down to three problems:

  1. No one agrees on what “microservice” means.
  2. Microservices conversations are abstract, with little tie-in to real business goals
  3. Adopting microservices without changing your organisation is pointless.

  Ian Miell — Container Solutions

Updated: March 16, 2025 — 9:53 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme