Articles
They tested this new git
merge strategy by using Scientist, a framework that runs both the old and new implementation and compares the results.
Jesse Toth — GitHub
DNS is simple (kinda) but it can be really difficult to fully wrap your head around it. This article explains why, and in the process gives a blueprint for designing more understandable tools in general.
Julia Evans
Fallback is different from Failover for a number of reasons. This article describes how they differ, how fallback works, and why you might choose it over failover.
Alex Ewerlöf
Repository Purpose: Provide teams and individuals an idea on what to take into consideration and what to aspire for in the SRE field and work
Note: these checklists are opinionated.
Arie Bregman
A thought-provoking article on trying to change people’s behavior in incidents through incentives (positive or negative) without also changing the context in which they act.
Fred Hebert — Learning From Incidents
Cloudflare shares what they learned as they transitioned their KV service to a new architecture which resulted in multiple unexpected problems.
Matt Silverlock, Charles Burnett, Rob Sutter, and Kris Evans — Cloudflare
In this article, learn about two interesting strategies for getting an organization to prioritize technical debt work: using a more specific name for the work, and referencing the work’s impact on an SLO — and the impact of not doing the work.
Emily Nakashima — Honeycomb
Full disclosure: Honeycomb is my employer.