Articles
Lots of companies seem to be redesigning their status pages lately. I love learning what was wrong with the old one and what they’ve changed to try to fix it.
Benjamin Stein — Twilio
A cringe-worthy story of a system failure (thankfully not production!) along with some ideas on preventing such failures.
Dan Woods
Just like last year, Catchpoint will donate $5 to charity if you take their survey!
This year we are back with a focus on outages and incidents. What impact do incidents have on the organization and the people responding to the incidents? How does this change across industry and organization?
Catchpoint
You can do a lot better than “the server is unhappy.” Be on the lookout for language like that. It’s usually a good learning opportunity or at the very least a good time to fill some gaps in instrumentation.
Arya Asemanfar — LightStep
Outages
- Sling TV
- UK’s Criminal Justice Secure eMail system (CJSM)
- Amazon.com
- Amazon Alexa
- Fastly Status – [Retrospective] Elevated Errors in Ashburn (IAD/BWI/DCA)
- Also this one.Full disclosure: Fastly is my employer.