…human error as a root cause isn’t where you should end, it’s where you should start your investigation.
The rules are:
- Learn, don’t blame
- Know the scope of the system
- Make sure you have all the relevant logs
- Make sure the logs lineup with the timeline
- Separate the noise from the information
- Make sure the biases are known
- Make sure you deal in facts and not counterfacts
…Cloud Pub/Sub was being advertised as beta software; we were unaware of any organisation other than Google who were using it at our scale.
Reports say the hackers executed approximately 100 million login attempts, and almost 21 million of these turned out to be successful.
- Xbox Live
- Fox and ABC News
Two large news sites suffered brief outages on Super Tuesday, an important voting day in the US. Both were apparently taken out by a failure in the analytics provider that they share in common.
- The Pirate Bay
- EE webmail
Miscommunication is cited in this construction-induced fiber cut.
- The Division (game)
- The KKK
Staminus, a DDoS protection company, suffered a huge data breach including full names and credit card numbers. The attackers also took down their infrastructure causing an outage for big-name clients such as the KKK.