Articles
Catchpoint is holding a mini-conference on the ways that SRE has changed as we shift to all-remote work, and I’m super-excited to be on the Q&A panel! Hope to see you there.
Catchpoint
A seasoned pro discusses some pitfalls of cloud-based architecture based on hard-won experience.
Rachel by the bay
Monzo is back with updates on how their on-call has changed since their original article in 2018.
Shubheksha Jalan — Monzo
Along with this rockin’ article about why it’s important to make on-call bearable, Incident Labs also has a survey on your on-call experience. Click through for the link.
Incident Labs
This really crystallizes a lot of my concerns with anomaly detection.
Danyel Fisher — The New Stack / Honeycomb
If you ask someone why they did something, they’re likely to invent a logical-sounding reason without meaning to.
Lorin Hochstein
Outages
- Statuspage.io
- Google Hangouts
- BitMEX
- GitHub
- Squarespace
- Discord
- Fastly
- Also this one.
Full disclosure: Fastly is my employer.
- Also this one.
- IG Group
- Azure (Central India)
- Power failure; datacenter successfully switched to generator for everything except the air handlers.