SRE Weekly Issue #319

View on sreweekly.com

Articles

Incident Response Isn’t Enough

Be judicious when you generate remediation tasks from incidents, or you can end up investing in the wrong area.

Marc Brooker

ZEN and the art of Reliability

Zendesk SRE has a set of 8 reliability principles that guide what they do.

Jason Smale — Zendesk

Incident management best practices: before the incident

We’re going to talk about a few necessities that enable exceptional incident management.

Service ownership

Incident roles

The incident declaration process

Running incident drills

Robert Ross — FireHydrant

A Foolish Consistency: Consul at Fly.io

I don’t think you’re supposed to use Consul that way…

Read this article to follow along on an interesting design journey.

Thomas Ptacek — Fly.io

Slight Reliability Episode 6 – Afailability

One single metric for availability probably can’t tell you the whole story.

Stephen Townshend — Slight Reliability

Making operational work more visible

We can learn from the process another engineer takes to debug a problem. But often, a ticket or problem description is stripped of the process and just has the answer, hampering learning.

Lorin Hochstein — The ReadME Project (GitHub)

The Merpay SRE Team: Past and future

We’re still not 100% there as a team, but I hope this article will serve as a reference for anyone who might create an SRE team in the future.

@tjun — Mercari

Incident Analysis 101: Techniques for Sharing Incident Findings

This article gives 6 different ways to organize the findings from your retrospective to share with different audiences.

Vanessa Huerta Granda — Jeli

Gyros and Gimbals, oh my! — The James Webb Space Telescope

There’s a great reliability story in the way that the Hubble telescope and the Apollo missions used gimbals — and in the way that the JWST doesn’t.

Robert Barron — IBM

Outages

Hulu
IRS
- The US Internal Revenue Service’s systems went down on the due date for tax filing.
Instagram

SRE Weekly Issue #319

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, Rootly:

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues