SRE Weekly Issue #359

Articles

the Data Reliability Engineering team is here to monitor, automate and manage pipelines to enable our partner USDE teams to have the ease of mind to tackle projects to help Mercari move forward.

LameyerDaniel and OhshimaTakako — Mercari

Recruiting developers into Site Reliability Engineering (SRE)

Hiring in the Site Reliability Engineering (SRE) space is notoriously difficult. So it makes sense to figure out how to expand the hiring pool beyond existing SREs.

Ash Patel — SREpath

The yaml document from hell

SREs end up writing a lot of YAML. I mean, a lot. Fortunately it’s a really simple language with no hidden gotchas, right? Right?!

Ruud van Asseldonk

DNS Outage on 2023-01-25

Two Terraform changes that were developed and tested individually went out to production simultaneously, with unexpected results.

Jan David Nose — Rust

The technology behind GitHub’s new code search

Code search is a different beast from normal english language searching. Regexes, punctuation, no word stemming, and GitHub’s scale made this a challenging design.

Timothy Clem — GitHub

Your non-technical teams should be using incident management tools, too

This article argues that folks outside of engineering are doing incident response, whether they call it that or not.

incident.io

Quick! Grab all the evidence: Capturing application state for post-incident forensics.

In incidents, we’re concentrating on resolving impact as quickly as possible, and this can impair our ability to gather the information we need after the fact in order to actually figure out what happened.

Jake Cohen — PagerDuty

SRE Weekly Issue #359

Articles

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, Rootly:

Articles

Subscribe

RSS

Mastodon

Search Issues