SRE Weekly Issue #351

Seven years ago, I was busy pulling together content for the first several issues of SRE Weekly. Since then, I estimate that I’ve consumed over 6000 articles in my quest to curate content each week, most of them via text-to-speech. You all make it worthwhile! Thank you so much for reading, and thanks to all of the great authors out there for writing awesome articles. Here’s to another great year!

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒.

Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:

https://rootly.com/demo/

Articles

In this interview, Tammy Butow goes into detail on what it’s like being on call and how she improved a team’s horrible on-call burden by a factor of 10.

  Elena Boroda — Fiberplane

Do you need just one or two SREs? Or should you build a sprawling SRE team, with a dozen or more SREs on hand to support your organization’s reliability needs?

  JJ Tang — Rootly
  This article is published by my sponsor, Rootly, but their sponsorship did not influence its inclusion in this issue.

An unsanctioned (but not unheard of) action, a race condition, and multiple known design issues all contributed to this air accident.

  Admiral Cloudberg

A first-hand account of one way to handle DR in this reddit post. Worth reading through to the end.

  u/disasterrecoverywhat — reddit

Rackspace’s Hosted Microsoft Exchange offering has been down for over a week, and they’re assisting (and paying for) customers to move to Microsoft 365.

  Roger Montti — Search Engine Journal

It’s a good idea to leave yourself a safety hatch to administer your system when everthing’s gone to heck… otherwise you might have to break out the angle grinders.

  Oren Eini — Hibernating Rhinos

This intriguing debugging story also sheds some light on how Honeycomb’s custom-built columnar data store works.

  Paul Osman — Honeycomb
  Full disclosure: Honeycomb is my employer.

Tons of incredibly good advice in this infographic + article on debugging.

  Julia Evans

Updated: December 11, 2022 — 8:46 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme