Bit of a short one this week as I recover from my third bout of COVID. Fortunately, this is another relatively mild one (thank you, vaccine!). Good luck everyone, and get your boosters.
This article explores the advantages of powering SLOs with observability data.
Pierre Tessier — Honeycomb
Full disclosure: Honeycomb is my employer.
As the James Webb Space Telescope moves into normal operations, there are more great SRE lessons to be learned.
Jennifer Riggins — The New Stack
During 5 years of experience as an SRE, the author of this article gathered a set of best practice patterns for software development and operation, which they share with us.
How Airbnb built a persistent, high availability and low latency key-value storage engine for accessing derived data from offline and streaming events.
Chandramouli Rangarajan, Shouyan Guo, Yuxi Jin — Airbnb
By owning and reporting MTTR, teams have no choice but to be accountable for the reliability of the code they write. This dramatically changes the culture of engineering.
Sidu Ponnappa — Last9
I learned about plan continuation bias while reading this air accident report, and I’m certain I’ve experienced this during incidents I’ve been involved in.