Here’s an ultra-practical guide to pushing for reliability investments at your company, formatted as a runbook with a set of specific steps.
Ross Brodbeck
A neat dive into how Amazon’s MemoryDB composes multiple systems to create a redundant Redis-compatible data store.
Marc Brooker
This article looks into the economic and psychological impact of a culture of blame.
Lee Atchison — Blameless
It took me two read-throughs to fully get this one, and I’m reallyglad I did it.
If we only examine people’s actions in the wake of an incident, and not when things go well, then we fall into the trap of selecting on the dependent variable.
Lorin Hochstein
To prevent dangerous deploy collisions, these folks wrote an open source tool to mediate who gets to deploy when.
Andrew Kannan — Klaviyo
if you’ve never worked at a startup before, you may be over-estimating how much you need to learn and how quickly.
When all you have is early adopters, you’re in a more forgiving environment, including for reliability.
Nicholas Yan — Graphite
Structured logging is great, but there can be pitfalls and gotchas.
Oakley Hall
An intro to SLOs with useful formulas, from the creator of the SLO Calculator featured here awhile back.
Alex Ewerlöf
SRE Weekly, a production of Tinker Tinker Tinker, LLC · {{Sender_Address}} · {{Sender_City}}, {{Sender_State}} {{Sender_Zip}}