Lots of companies seem to be redesigning their status pages lately. I love learning what was wrong with the old one and what they’ve changed to try to fix it.
Benjamin Stein — Twilio
A cringe-worthy story of a system failure (thankfully not production!) along with some ideas on preventing such failures.
Just like last year, Catchpoint will donate $5 to charity if you take their survey!
This year we are back with a focus on outages and incidents. What impact do incidents have on the organization and the people responding to the incidents? How does this change across industry and organization?
You can do a lot better than “the server is unhappy.” Be on the lookout for language like that. It’s usually a good learning opportunity or at the very least a good time to fill some gaps in instrumentation.
Arya Asemanfar — LightStep