Complacency is the enemy of resilience. The longer you wait for disaster to strike in production — merely hoping that everything will be okay — the less likely you are to handle emergencies well, both at a technical and organizational level.
However, when Google had its recent 12-hour outage that took Snapchat offline, it didn’t impact any of Google’s real revenue-generating services. […] What would the impact have been if Google Search was down for 12 hours?
Thanks to Charity for this one.
Note that there’s been some question on hangops #sre on whether this is a hoax. Either way I could totally see it happening.
- Yahoo Mail
- Business Wire
- Google Compute Engine
GCE suffered a severe network outage. It started as increased latency and at worst became a full outage of internet connectivity. Two days after the incident, Google released the best postmortem I’ve seen in a very long time. Full transparency, a terrible juxtaposition of two nasty bugs, a heartfelt apology, fourteen(!) remediation items… it’s clear their incident response was solid and they immediately did a very thorough retrospective.
- North Korea
North Korea had a series of internet outages, each of the same length at the same time on consecutive days. It’s interesting how people are trying to learn things about the reclusive country just from this pattern of outages.
- Blizzard's Battle.net
- Two Alt-Coin exchanges (Shapeshift and Poloniex)
- Home Depot