SRE Weekly Issue #12

Articles

What an excellent resource! This repo contains a pile of postmortems for our reading and learning pleasure. I’m linking to the repo now, but I don’t promise not to call out specific awesome postmortems from it in the future.

When you’re in the trenches trying to get the service back up and running, it can be hard to find the time to tell everyone else in your company what’s going on. It’s critically important though, add Statuspage.io writes in this article.

Full disclosure: Heroku, my employer, is mentioned.

Digital Ocean shares this overview of the basic concepts involved in high availability.

This article discusses a method of computing the availability of an overall system made up of individual components with differing availabilities. It gives general formulas and methods that are fairly simple, yet powerful.

What do you do when you have to modify an existing production system that has less-than-wonderful code quality? This article is an impassioned plea to test the heck out of your changes and always try to release production-quality code the first time.

Google is launching a reverse-proxy for DDoS mitigation. Interestingly, it’s only for news and free speech sites and it’s completely free.

Outages

Updated: February 28, 2016 — 9:55 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme