SRE Weekly Issue #156

A message from our sponsor, VictorOps:

DevOps and SRE go hand-in-hand. See how building a DevOps culture of transparency and collaboration can inherently lead to proactive SRE efforts – and ultimately, more reliable systems:

http://try.victorops.com/sreweekly/devops-leads-to-inherent-sre

Articles

Lots of companies seem to be redesigning their status pages lately. I love learning what was wrong with the old one and what they’ve changed to try to fix it.

Benjamin Stein — Twilio

A cringe-worthy story of a system failure (thankfully not production!) along with some ideas on preventing such failures.

Dan Woods

Just like last year, Catchpoint will donate $5 to charity if you take their survey!

This year we are back with a focus on outages and incidents. What impact do incidents have on the organization and the people responding to the incidents? How does this change across industry and organization?

Catchpoint

You can do a lot better than “the server is unhappy.” Be on the lookout for language like that. It’s usually a good learning opportunity or at the very least a good time to fill some gaps in instrumentation.

Arya Asemanfar — LightStep

Outages

Updated: January 20, 2019 — 8:11 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme