SRE Weekly Issue #359

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly ðŸš’.

Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:


the Data Reliability Engineering team is here to monitor, automate and manage pipelines to enable our partner USDE teams to have the ease of mind to tackle projects to help Mercari move forward.

  LameyerDaniel and OhshimaTakako — Mercari

Hiring in the Site Reliability Engineering (SRE) space is notoriously difficult. So it makes sense to figure out how to expand the hiring pool beyond existing SREs.

  Ash Patel — SREpath

SREs end up writing a lot of YAML. I mean, a lot. Fortunately it’s a really simple language with no hidden gotchas, right? Right?!

  Ruud van Asseldonk

Two Terraform changes that were developed and tested individually went out to production simultaneously, with unexpected results.

  Jan David Nose — Rust

Code search is a different beast from normal english language searching. Regexes, punctuation, no word stemming, and GitHub’s scale made this a challenging design.

  Timothy Clem — GitHub

This article argues that folks outside of engineering are doing incident response, whether they call it that or not.

In incidents, we’re concentrating on resolving impact as quickly as possible, and this can impair our ability to gather the information we need after the fact in order to actually figure out what happened.

  Jake Cohen — PagerDuty

Updated: February 12, 2023 — 9:14 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme