SRE Weekly Issue #326

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set):
https://rootly.com/demo/

Articles

Catchpoint and Blameless have teamed up on this year’s SRE survey. They’ve sweetened the deal with two $5 donations to charity for every survey completed. Go do it!

  Kurt Andersen — Blameless

I sure miss the good old “checkmark-i” icon. Oh wait, no I don’t.

  Jeff Martens — Metrist

How can you handle failure gracefully? Click through for 6 strategies to consider.

  Boris Cherkasky — Riskified

Declaring the first incident when you start a new job can be intimidating, but it really shouldn’t be. Let’s look at some common fears, and work out how to address them.

  Isaac Seymour — incident.io

The incident involved fiber equipment failure and a suboptimal automated remediation.

  Google

This is a primer on Urgency and Impact in incidents, including the difference between them and how to use them.

  Noor-ul-Anam Ruqayya — Blameless

Running retrospectives on near-miss incidents can be highly valuable, as this article discusses.

  Vanessa Huerta Granda — Jeli

Outages

Updated: June 12, 2022 — 9:45 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme