SRE Weekly Issue #445

A message from our sponsor, FireHydrant:

FireHydrant has acquired Blameless! The addition of Blameless’ enterprise capabilities combined with FireHydrant’s platform creates the most comprehensive enterprise incident management solution in the market.

https://firehydrant.com/blog/press-release-firehydrant-acquires-blameless-to-further-solidify-enterprise/

Providing incident resolution times to customers is an unneeded stress for responders with very little gain.

  Robert Ross — FireHydrant

I can’t tell you how many times I’ve found myself lost in thought, wondering how something like EBS works. While this isn’t an architecture overview, it does contain a bunch of juicy tidbits. I especially like the bit about the value of a “full stack engineer”.

  Marc Olson — All Things Distributed

This article explains how to use eBPF to gather observability data, including an example eBPF program and instructions on how to run it.

   Kranthi Kiran Erusu — DZone

Netflix uses multiple kinds of data stores. It was difficult for developers to manage the differences between data stores, so they wrote an abstraction layer.

Our goal was to build a versatile and efficient data storage solution that could handle a wide variety of use cases, ranging from the simplest hashmaps to more complex data structures, all while ensuring high availability, tunable consistency, and low latency.

  Vidhya Arvind, Rajasekhar Ummadisetty, Joey Lynch, and Vinay Chella — Netflix

This post looks at the challenges of predicting capacity in a global CDN, including dealing with uncertainties in customer growth, traffic routing, hardware failure, and more.

  Curt Robords — Cloudflare

GitHub tells us about the tools they use to improve reliability and performance, including Scientist and Flipper.

  Nick Hengeveld — GitHub

If you’re heavily action-item-oriented like I used to be, this is a great read to get you thinking down a different path.

My coworker wrote this awesome script to update our various @team-oncall aliases in Slack automatically, following our PagerDuty on-call schedule. This one thing has already saved us so much in the way of toil, frustration, and missed notifications!

  Fred Hebert — Honeycomb

  Full disclosure: Honeycomb is my employer.

Updated: October 6, 2024 — 9:08 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme