Articles
There was a plan in the works in the months before the Pulse nightclub mass shooting in Florida (US) in 2016, designed for getting victims out of a “hot” zone. The story about why it wasn’t implemented echoes the kind of organizational failings we see as SREs.
Abe Aboraya — ProPublica
Facebook is at it again! Here’s a new system based on a state machine driven by Chef.
Declan Ryan — Facebook
Google has produced a new guide on designing DR in Google Cloud Platform:
We’ve put together a detailed guide to help steer you through setting up a DR plan. We heard your feedback on previous versions of these DR articles and now have an updated four-part series to help you design and implement your DR plans.
Grace Mollison — Google
[…] you must be part of the team working on the system. You cannot be someone that hurts a system and then wait for others to fix the problem.
Jan Stenberg — InfoQ
If you’ve ever been woken in the middle of the night just to see that an alert could be solved by adding another server or two to the loadbalancer, you need capacity plans and you need them yesterday.
Evan Smith — Hosted Graphite
[…] our industry has finally reached the tipping point at which it has become viable to build distributed systems from scratch, at a fast pace of iteration and low cost of operation, all while still having a small team to execute
The author argues that it’s possible to avoid building tech debt while still retaining the velocity a new startup needs.
Author: Santiago Suarez Ordoñez — Blameless, Inc.
From a single host, to a bigger host, to leader/follower replication and active/active setups. The distinction between active/active versus “Multi-Active” is worth reading.
Sean Loiselle — Cockroach Labs
Outages
- Crowdpac (crowd-funding site)
- Crowdpac briefly went down as visitors swarmed the site to make donations to a campaign raising funds for the future opponent of US Senator Susan Collins, due to her controversial vote on the confirmation of (now-)Justice Kavanaugh.
- AWS (us-west-2)
- Ecobee (home automation)
- German Parliament’s IT system
- Cisco Webex