SRE Weekly Issue #485

Migrating the Jira Database Platform to AWS Aurora

How would you migrate several million databases, with minimal impact to your users?

Atlassian allocates one Postgres database per tenant customer, with a few thousand colocated on each RDS instance. This migration story was a riveting read!

Pat Rubis — Atlassian

“What went well” is more than just a pat on the back

Here’s my claim: providing details on how things went well will reduce your future mitigation time even more than focusing on what went wrong.

Lorin Hochstein

How we built reliable log delivery to thousands of unpredictable endpoints

My favorite part of this article was the explanation of how they handle pent-up logs when a customer’s endpoint recovers, without overwhelming the endpoint.

Gabriel Reid — Datadog

Surprise Surprise: When Reality Doesn’t Read the Runbook

How do you deal with fundamental surprise? This article introduces the concept of surprise², an incident you couldn’t see coming. Click through for some strategies to handle the inevitable occasional fundamentally surprising incident.

Stuart Rimell — Uptime Labs

How We Broke the Monolith (and Kept Our Sanity): Lessons From Moving to Microservices

A team found themselves needing to switch to microservices, and they chronicled their approach and results. I really like the section on the surprises they encountered.

Shushyam Malige Sharanappa — DZone

Seventh-generation server hardware at Dropbox: our most efficient and capable architecture yet

Dropbox shares what went into the rollout of their new fleet, including careful management of heat, vibration, and power.

Eric Shobe and Jared Mednick — Dropbox

Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure

In this blog post, we’ll dive into the details of three mighty alerts that play their unique role in supporting our production infrastructure, and explore how they’ve helped us maintain the high level of performance and uptime that our community relies on.

…plus one bonus alert!

Jeremy Udit — Hugging Face

Our Experience with Amazon Aurora Blue/Green Deployments

Klaviyo adopted RDS’s blue/green deployment feature to make MySQL version upgrades much less painful. In this article they share their path to blue/green deployment and their results.

Marc Dellavolpe — Klaviyo

SRE Weekly Issue #485

Subscribe

RSS

Mastodon

Search Issues