SRE Weekly Issue #302

Happy holidays, for those that celebrate! I put this issue together in advance, so no Outages section this week.

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo:


This is another great deep-dive into strategies for zero-downtime deploys.

  Suresh Mathew — eBay

How do you make sure your incident management process survives the growth of your team? This article has a useful list of things to cover as you train new team members.

  David Caudill — Rootly
This article is published by my sponsor, Rootly, but their sponsorship did not influence its inclusion in this issue.

The trends in this article are:

  • AIOps and self-healing platforms
  • Service Meshes
  • Lowcode DevOps
  • GitOps
  • DevSecOps

  Biju Chacko — squadcast

I can’t get enough of these. Please write one about your company!

  Ash Patel

My favorite part is the discussion of Kyle Kingsbury’s work on Jepsen. Would distributed systems have even more problems if Kingsbury did not shed light on them?

  Dan Luu

PagerDuty analyzed usage data for their platform in order to draw inferences about how the pandemic has affected incident response.


There’s a ton of interesting stuff in here about confirmation bias and fear in adopting a new, objectively less risky process.

  Robert Poston, MD

Updated: December 26, 2021 — 8:36 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme