SRE Weekly Issue #485

YOUR AD COULD BE HERE!

SRE Weekly has openings for new sponsorships. Reply or email lex at sreweekly.com for details.

How would you migrate several million databases, with minimal impact to your users?

Atlassian allocates one Postgres database per tenant customer, with a few thousand colocated on each RDS instance. This migration story was a riveting read!

  Pat Rubis — Atlassian

Here’s my claim: providing details on how things went well will reduce your future mitigation time even more than focusing on what went wrong.

  Lorin Hochstein

My favorite part of this article was the explanation of how they handle pent-up logs when a customer’s endpoint recovers, without overwhelming the endpoint.

  Gabriel Reid — Datadog

How do you deal with fundamental surprise? This article introduces the concept of surprise2, an incident you couldn’t see coming. Click through for some strategies to handle the inevitable occasional fundamentally surprising incident.

  Stuart Rimell — Uptime Labs

A team found themselves needing to switch to microservices, and they chronicled their approach and results. I really like the section on the surprises they encountered.

   Shushyam Malige Sharanappa — DZone

Dropbox shares what went into the rollout of their new fleet, including careful management of heat, vibration, and power.

  Eric Shobe and Jared Mednick — Dropbox

In this blog post, we’ll dive into the details of three mighty alerts that play their unique role in supporting our production infrastructure, and explore how they’ve helped us maintain the high level of performance and uptime that our community relies on.

…plus one bonus alert!

  Jeremy Udit — Hugging Face

Klaviyo adopted RDS’s blue/green deployment feature to make MySQL version upgrades much less painful. In this article they share their path to blue/green deployment and their results.

  Marc Dellavolpe — Klaviyo

Updated: July 13, 2025 — 9:43 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme