Articles
A new spin on changing the engines on a jet in flight: using DNS request/response rewriting to transition an application over without modification.
lainra β Mercari
How much additional capacity can you get for a dollar?
Dan Slimmon
Dealing with the unknown, limited cognitive bandwidth, coordination patterns, psychological safety and feeding information back into the organization.
Fred Hebert β The New Stack
Full disclosure: Honeycomb is my employer.
How do you enable adoption of SRE principles at a large, mature company that has little capacity for innovation?
the value proposition of βSREβ is the idea that you can handle an exponentially growing business with a logarithmically growing payroll.
Layer Alpeh
Read this one to learn about four attributes of good alerting and how to ensure your SLO burn rate alerts are effective.
Saheed Oladosu
There’s plenty of content out there telling you how to implement observability, or what good looks like. But what about bad observability? What are some anti-patterns to watch out for?
Stephen Townshend β SquaredUp
This is an interview about on-call with Twilio’s VP of SRE who also spent 17 years as an SRE at Google.
Elena Boroda
They started with a (mostly) single-availability-zone Kafka deployment. Here’s how they transitioned to a multi-zone architecture that can survive a single AZ failure.
Andrey Polyakov and Kamya Shethia β Etsy