SRE Weekly Issue #439

A message from our sponsor, FireHydrant:

Migrate off of PagerDuty, save money, and then have all of your configuration exported as Terraform modules? We did that. We know one of the hardest parts of leaving a legacy tool is the old configuration, that’s why we dedicated time to build the Signals migrator, making it easy to switch.

Read on to learn why client-side network monitoring is so important and what you are missing if your only visibility into network performance is from a backend perspective.

  Fredric Newberg — The New Stack

An engineer with no Kubernetes experience migrates an app to Kubernetes — with a bit of help from StackOverflow and Copilot, of course.

  Jacob Brandt — Klaviyo

As data teams become increasingly critical, problems in their systems become incidents. Here’s an overview of how one data team has designed their incident response process.

  Navo Das —

Certificate pinning can be a useful practice, but it’s also fraught with pitfalls and outage risks, especially with the modern tendency toward shorter certificates and multiple intermediates. What can we do instead?

  Dina Kozlov — Cloudflare

A super-thorough overview of SLAs with a helpful section on how to chose the level for an SLA.

  Diana Bocco — UptimeRobot

This debugging story focuses on a Linux TCP option I wasn’t familiar with: tcp_slow_start_after_idle.

  Amnon Cohen — Ably

This is the story of a company that got an unexpectedly huge rush of interest in their platform—and traffic too. They made a number of changes to quickly scale to meet the demand.

  Jekaterina Petrova — Dyninno

This Honeycomb incident followup seems to be related to their post that I shared last week.


  Full disclosure: Honeycomb is my employer.

Updated: August 25, 2024 — 10:26 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme