SRE Weekly Issue #417

A message from our sponsor, FireHydrant:

Join FireHydrant this Thursday for a conversation about on-call burnout and how to prevent it. Get a better understanding of what makes a fatigue-free on-call culture, including real-world examples from your incident management peers. No sales, just shop talk.
https://app.livestorm.co/firehydrant/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024

Remember that cool lava lamp random number generator that Cloudflare uses? Now they have a couple of other sources of entropy, and they’re teaming up with other companies.

  Cefan Daniel Rubin, Luke Valenta, and Thibault Meunier — Cloudflare

To support 123 million simultaneous streams (!), Paramount+ migrated to a multi-region architecture with a distributed, multi-write database.

  Denis Magda — Yugabyte

DevOps Research and Assessment or the Digital Operational Resilience Act, which is which? Turns out they both matter to SREs.

  Lee Fredricks — PagerDuty

2038 isn’t so far off now. Do you have a plan for 64-bit timestamps?

  Code Reliant

To ensure they would dogfood the new account process regularly, these folks delete a random employee’s account in their product every day.

  Greg Foster — Graphite

Hey, check it out, sidecars are going to be fully supported in upcoming versions of Kubernetes!

  Steven Aldinger — TeamSnap

As part of releasing a new product, FireHydrant ran simulations to determine the right SLO — and uncover some room for optimization.

  Danielle Leong — FireHydrant

  This article is published by my sponsor, FireHydrant, but their sponsorship did not influence its inclusion in this issue.

If you’re new to distributed tracing, this is a great overview. The part about automated instrumentation for span tracing is especially useful.

  Chris Battarbee Metoro

Updated: March 24, 2024 — 9:08 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme