How would you migrate several million databases, with minimal impact to your users?
Atlassian allocates one Postgres database per tenant customer, with a few thousand colocated on each RDS instance. This migration story was a riveting read!
Pat Rubis — Atlassian
Here’s my claim: providing details on how things went well will reduce your future mitigation time even more than focusing on what went wrong.
Lorin Hochstein
My favorite part of this article was the explanation of how they handle pent-up logs when a customer’s endpoint recovers, without overwhelming the endpoint.
Gabriel Reid — Datadog
How do you deal with fundamental surprise? This article introduces the concept of surprise2, an incident you couldn’t see coming. Click through for some strategies to handle the inevitable occasional fundamentally surprising incident.
Stuart Rimell — Uptime Labs
A team found themselves needing to switch to microservices, and they chronicled their approach and results. I really like the section on the surprises they encountered.
Shushyam Malige Sharanappa — DZone
Dropbox shares what went into the rollout of their new fleet, including careful management of heat, vibration, and power.
Eric Shobe and Jared Mednick — Dropbox
In this blog post, we’ll dive into the details of three mighty alerts that play their unique role in supporting our production infrastructure, and explore how they’ve helped us maintain the high level of performance and uptime that our community relies on.
…plus one bonus alert!
Jeremy Udit — Hugging Face
Klaviyo adopted RDS’s blue/green deployment feature to make MySQL version upgrades much less painful. In this article they share their path to blue/green deployment and their results.
Marc Dellavolpe — Klaviyo