I’ve shared this article before, but it’s so critical that it’s time to give it another read. MTTR is a statistically useless metric, and by using it, we will draw faulty conclusions and potentially take harmful actions. Courtney Nash does a really great job of laying out the science in an easy-to-understand way.
Courtney Nash — Resilience in Software Foundation / The VOID
I like the analogy here: when we say people are components in or sociotechnical systems, system diagrams are like a form of cache.
Clint Byrum
From Werner Vogels’s intro to this article:
Andy takes us through S3’s evolution from simple object store to sophisticated data platform, illustrating how customer feedback has shaped every aspect of the service. It’s a fascinating look at how we maintain simplicity even as systems scale to handle hundreds of trillions of objects.
Andy Warfield — Amazon
Instead of a traditional Cost/Performance/Reliability trade-off, this article argues that serverless presents a tradeoff of Cost, Performance, and Complexity.
Luc van Donkersgoed
Google uses System Theoretic Process Analysis to identify problems in their systems. They found that the most effective way to spread adoption of STPA was to build their own training program.
Garrett Holthaus — Google
So far, I’m liking this new post series from Nextdoor about their efforts to scale their datastore. Here’s the first installment, about the things they’ve tried up to now.
I’ll share the rest of the series as I work my way through them.
Slava Markeyev — Nextdoor
Wow, I had no idea EBS volumes failed this often!
Nick Van Wiggeren — PlanetScale