Articles
Friday deploys are going to be necessary occasionally, even if we try to ban them. Doing so will only mean that we’re less experienced at executing Friday deploys successfully.
Will Gallego
Jet engines are Complicated. The system of jet engine maintenance (including the technicians, policies, schedules, etc) is Complex. Understanding the difference is key to managing complex systems.
Adam Johns
In this issue, we have articles from the front-line, as well as from safety, legal, leadership, human factors and psychology specialists.
Hindsight is a magazine targeted at air traffic controllers. An example article title from this issue:
Mode-Switching in Air Traffic Control
Thanks to Greg Burek for this one.
The US Federal Communications Commission released their report on an outage last December that took down 911 (emergency services) across a large swathe of the US.
This outage was caused by an equipment
failure catastrophically exacerbated by a network configuration error.
They’re two separate concepts, but they’re often presented together, blurring the line between them.
Daniel Abadi
I love the idea of applying the ideas of resilience engineering to child welfare services. This article quotes from Hollnagel and Dekker.
Tom Morton and Jess McDonald
Outages
- Amazon Cloud Outage Causing Major Issues at Some Crypto Exchanges – CoinDesk
- Amazon and some Cryptocurrency Exchanges
- AWS had an outage in Asia Pacific, affecting some cryptocurrency exchanges. There’s some speculation that the outage may have resulted in some bitcoins being bought for under 1 USD (way below market value).
- GitHub
- Google OAuth
- Along with preventing logins to Google some services, this also affected “Log in with Google” on non-Google sites.