Articles
This air traffic accident analysis is chilling to listen to, and also incredibly educational. As you listen through the conversation, it becomes more and more clear that the pilot is suffering from information overload. An Incident Commander would be wise to remember the lessons learned here.
After listening to the above recording, I got hooked and kept listening to more and more case studies. Here’s another enlightening one: Real Pilot Story: From Miscue to Rescue
US Air Safety Institute
PagerDuty is quickly approaching Etsy’s level of awesome incident-related articles and guides.
Rachael Byrne — PagerDuty
Retiring features and products can often be harder to do safely than deploying them in the first place.
Rachana Kumar– Etsy
Do your SLIs measure what really matters to your customers? This article discusses how to find out and what to do if they don’t.
Adrian Hilton and Yaniv Aknin — Google
Outages
- Google G Suite
- All services exerienced an outage, most notably Gmail.
- Microsoft Azure
- Parts of Azure were dependent on a third-party DNS provider, and an outage in that provider caused widespread issues in Azure. See Microsoft’s followup post in their status history.
- Reddit
- And a second one the same day.
- Microsoft Office 365
- Gmail