Here’s another great article urging caution when adopting new tools. Codeship’s Jessica Kerr categorizes technologies into a continuum of risk, from single-developer tools all the way up to new databases. She goes into a really excellent amount of detail, providing examples of how adopting a new technology can come back to bite you.
After several recent incidents of nations cutting off or severely curtailing internet connectivity, the UN took a stand, as reported in this Register article:
The United Nations officially condemned the practice of countries shutting down access to the internet at a meeting of the Human Rights Council on Friday.
Is it possible to design an infrastructure and/or security environment in which a rogue employee cannot take down the service?
Mathais Lafeldt is back in this latest issue of Production Ready. In this part 1 of 2, he reviews Richard Cook’s classic How Complex Systems Fail, with an eye toward applying it to web systems.
And with a nod to Lafeldt for the link, here’s another classic from John Allspaw on complexity of failures.
In the same way that you shouldn’t ever have root cause “human error”, if you only have a single root cause, you haven’t dug deep enough.
SGX released a postmortem for their mid-July outage in the form of a press release. Just as Allspaw tells us, the theoretically simple root cause (disk failure) was exacerbated by a set of complicating factors.
In this recap of a joint webinar, Threat Stack and VictorOps share 7 methods to avoid and reduce alert fatigue.
- Zen (UK ISP)
Petnet suffered a server outage that prevented their smart feeders from feeding customers’ pets for hours.
Last week’s British Telecom outage resulted in 102 IKEA brick-and-mortar store customers’ cards being double-charged.
- Pokemon GO
Thanks to Jonathan Rudenberg for this one.
This one slipped through my normal news collection. Fortunately(?) I caught it while trying to make a purchase on Amazon.
- Amazon Prime Instant Video