SRE Weekly Issue #313

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set):
https://rootly.com/demo/

Articles

Do you need an incident commander? (Yes.) This article is about how to staff your incident command rotation through a couple of different strategies.

  Ryan McDonald — FireHydrant

What an interesting idea, an insurance plan that pays out automatically when a cloud provider has an outage.

  L.S. Howard — Insurance Journal
Full disclosure: Fastly, my employer, is mentioned.

LaunchDarkly revamped the way that their on-call system works. Learn about the experience through the eyes of a newly-onboarded engineer.

  Anna Baker — LaunchDarkly (via The New Stack)

Catchpoint’s yearly SRE Report is out with four key findings. You have to fill out a form with your email address, and then the link to download the report is presented in your browser.

  Catchpoint

This article shows why one-thread-per-request can be a bottleneck and presents alternatives.

  Ron Pressler — Parallel Universe (via High Scalability)

And this is a truth about incidents: there are always more signals than there is attention available.

It’s so true.

  Fred Hebert — Honeycomb

If you’ve ever even considered running a retrospective, read this article.

This is my favorite piece of advice from this article:

If you think ‘this might be a stupid question,’ ask it.

  Emily Ruppe — Jeli

I’m still not sure how I feel about AIOps. Fortunately, this article takes a measured stance while providing some useful insight.

Conclusion: AI won’t replace SREs – but it can help

  JJ Tang — Rootly
This article is published by my sponsor, Rootly, but their sponsorship did not influence its inclusion in this issue.

Outages

Updated: March 13, 2022 — 9:47 pm
SRE WEEKLY © 2015 Frontier Theme