SRE Weekly Issue #410

A message from our sponsor, FireHydrant:

How many seats are you paying for in your legacy alerting tool that rarely get paged? With Signals’ bucket pricing, you only pay for what you use. Join the beta for a better tool at a better price.
https://firehydrant.com/blog/signals-beta-live/

In this blog post, we describe the journey DoorDash took using a service mesh to realize data transfer cost savings without sacrificing service quality.

  Hochuen Wong and Levon Stepanian — DoorDash

When just a few “regulars” are called in to handle every incident, you’ve got a knowledge gap to fill in your organization.

  David Ridge — PagerDuty

Dropbox expands into new datacenters often, so they have a streamlined and detailed process for choosing datacenter vendors.

  Edward del Rio — Dropbox

This is either nine things that could derail your SRE program, or a list of things to do with “not” in front of them — either way, it’s a good list.

  Shyam Venkat

We need enough alerting in our systems that we can detect lurking anomalies, but not so much that we get alert fatigue.

  Dennis Henry

A post about the importance of product in SRE, and how to make product and SRE first-class citizens in your Software Development Lifecycle.

  Jamie Allen

A relatively minor incident took a turn for the worse after the pilots attempted a close fly-by in an attempt to resolve it. I swear I’ve been in this kind of incident before, where I took risks significantly out of proportion to the problem I was trying to solve.

  Kyra Dempsey (Admiral Cloudberg)

Updated: February 4, 2024 — 8:06 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme