This article is a condensed version of a talk, but it stands firmly on its own. Their Production-Grade Infrastructure Checklist is well worth a read.
Yevgeniy Brikman — Gruntwork
More and more, the reliability of our infrastructure is moving into the realm of life-critical.
Thanks to Richard Cook
Linda Comins — The Intelligencer for this one.
Detailed notes on lots of talks from SRECon, with a great sum-up at the top discussing the major themes of the conference.
Drawing from an @mipsytipsy Twitter thread from back in February, this article is a great analysis of why it’s right to put developers on call and how to make it humane. I especially like the part about paying extra for on-call, a practice I’ve been hearing more mentions of recently.
Really? Never? I could have sworn I remembered reading about power outages…
Yevgeniy Sverdlik — DataCenter Knowledge
Lots of good stuff in this one about preventing mistakes and analyzing failures.
Rachel Bryan — Swansea University