Articles
It’s that time of year again, but maybe it’s time to rethink that code freeze.
Robert Ross — FireHydrant
This article really gets to the heart of why I love a good incident. I mean, obviously, I want to minimize, incidents. I swear.
Lisa Karlin Curtis — incident.io
This article draws on incident reports from The VOID to show how root cause analysis can be problematic.
Courtney Nash — Verica
It’s interesting to read this article after reading the previous one. In the “my car won’t start”, I found myself immediately wondering, why was the vehicle not maintained? What factors contributed to that?
Søren Pedersen — Dzone
These are the “phases”, although they stress that aiming for Visionary doesn’t make sense for all organizations.
- Absent
- Reactive
- Proactive
- Strategic
- Visionary
Not the field I would have expected to look to for lessons, but it totally works!
Paul Marsicovetere — Formidable
This article introduces a 3-phased approach for safe database schema changes: Expand, Rollout, and Contract.
Alex Yates — Octopus Deploy
Try to run a program, and you get “No such file or directory”, even though the program is right there. How can this happen?
Julia Evans
Outages
- Google Cloud Load Balancing
- Google had a major outage that took down many sites and services. Notably, users of these sites were greeted with a Google 404 page with no branding related to the site they were attempting to access.
- Grab
- Tesla
- Tesla owners were locked out of their cars or unable to start them during the outage.