SRE Weekly Issue #384

Articles

They tested this new git merge strategy by using Scientist, a framework that runs both the old and new implementation and compares the results.

Jesse Toth — GitHub

Why is DNS still hard to learn?

DNS is simple (kinda) but it can be really difficult to fully wrap your head around it. This article explains why, and in the process gives a blueprint for designing more understandable tools in general.

Julia Evans

Fallback

Fallback is different from Failover for a number of reasons. This article describes how they differ, how fallback works, and why you might choose it over failover.

Alex Ewerlöf

GitHub – bregman-arie/sre-checklist: A checklist of anyone practicing Site Reliability Engineering

Repository Purpose: Provide teams and individuals an idea on what to take into consideration and what to aspire for in the SRE field and work

Note: these checklists are opinionated.

Arie Bregman

Reader: Carrots, sticks, and making things worse

A thought-provoking article on trying to change people’s behavior in incidents through incentives (positive or negative) without also changing the context in which they act.

Fred Hebert — Learning From Incidents

Hardening Workers KV

Cloudflare shares what they learned as they transitioned their KV service to a new architecture which resulted in multiple unexpected problems.

Matt Silverlock, Charles Burnett, Rob Sutter, and Kris Evans — Cloudflare

Anything But Tech Debt

In this article, learn about two interesting strategies for getting an organization to prioritize technical debt work: using a more specific name for the work, and referencing the work’s impact on an SLO — and the impact of not doing the work.

Emily Nakashima — Honeycomb
Full disclosure: Honeycomb is my employer.

SRE Weekly Issue #384

Articles

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, Rootly:

Articles

Subscribe

RSS

Mastodon

Search Issues