Many apologies to Matt Cooper at GitHub, who is the actual author of the article Scaling Merge-ort Across GitHub from last week. Sorry for the mis-credit, Matt!
Articles
This article will really come in handy next time you need to explain SRE to your execs.
Kit Merker — DevOps.com
By mapping the Westrum Model of organizational cultures to SRE, we can understand SRE culture adoption.
Vladyslav Ukis and Ben Linders — InfoQ
Disney’s SRE teams have ensured that the magic keeps happening, even as experiences and their underlying technology become more and more complex.
Ash Patel — SREPath
There’s so much to learn from this tragedy, I might read this one again. A mid-air collision these days should be effectively impossible due to TCAS. In this case, many factors conspired to bring about disaster.
Admiral Cloudberg
Here they are, out in the open:
- SLOs create a common understanding in the organization about reliability
- SLOs require investment into improved observability
- SLOs prompt decisions about risk management… and risk-taking
Amin Astaneh — Certo Modo
The “five standard models” are actually more like a 5-stage workflow:
- Triage,
- Examine,
- Diagnose,
- Test, and
- Cure.
Saheed Oladosu
This blog post will share broadly-applicable techniques (beyond GraphQL) we used to perform this migration. The three strategies we will discuss today are AB Testing, Replay Testing, and Sticky Canaries.
Jennifer Shin, Tejas Shikhare, Will Emmanuel — Netflix
Building from a review of traditional rate limiting techniques, this article then explains adaptive rate limiting and its benefits.
Sudhanshu Prajapati — FluxNinja