A new spin on changing the engines on a jet in flight: using DNS request/response rewriting to transition an application over without modification.
lainra — Mercari
How much additional capacity can you get for a dollar?
Dealing with the unknown, limited cognitive bandwidth, coordination patterns, psychological safety and feeding information back into the organization.
Fred Hebert — The New Stack
Full disclosure: Honeycomb is my employer.
How do you enable adoption of SRE principles at a large, mature company that has little capacity for innovation?
the value proposition of “SRE” is the idea that you can handle an exponentially growing business with a logarithmically growing payroll.
Read this one to learn about four attributes of good alerting and how to ensure your SLO burn rate alerts are effective.
There’s plenty of content out there telling you how to implement observability, or what good looks like. But what about bad observability? What are some anti-patterns to watch out for?
Stephen Townshend — SquaredUp
This is an interview about on-call with Twilio’s VP of SRE who also spent 17 years as an SRE at Google.
They started with a (mostly) single-availability-zone Kafka deployment. Here’s how they transitioned to a multi-zone architecture that can survive a single AZ failure.
Andrey Polyakov and Kamya Shethia — Etsy