SRE Weekly Issue #343

Bit of a short one this week as I recover from my third bout of COVID. Fortunately, this is another relatively mild one (thank you, vaccine!). Good luck everyone, and get your boosters.

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒.

Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:


This article explores the advantages of powering SLOs with observability data.

  Pierre Tessier — Honeycomb
  Full disclosure: Honeycomb is my employer.

As the James Webb Space Telescope moves into normal operations, there are more great SRE lessons to be learned.

  Jennifer Riggins — The New Stack

During 5 years of experience as an SRE, the author of this article gathered a set of best practice patterns for software development and operation, which they share with us.

  brandon willett

How Airbnb built a persistent, high availability and low latency key-value storage engine for accessing derived data from offline and streaming events.

  Chandramouli Rangarajan, Shouyan Guo, Yuxi Jin — Airbnb

By owning and reporting MTTR, teams have no choice but to be accountable for the reliability of the code they write. This dramatically changes the culture of engineering.

  Sidu Ponnappa — Last9

I learned about plan continuation bias while reading this air accident report, and I’m certain I’ve experienced this during incidents I’ve been involved in.

  Admiral Cloudberg

Updated: October 16, 2022 — 9:21 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme