SRE Weekly Issue #32

Articles

It’s tempting to use the newest shiny stack when building a new system. Dan McKinley argues that you should limit yourself to only a few shiny technologies to avoid excessive operational burden.

[…] the long-term costs of keeping a system working reliably vastly exceed any inconveniences you encounter while building it.

Postmortem-Report-Reviews/2016-07-20-pshima-stack-exchange-2016-07-20.md

Quick on the draw, Pete Shima gives us a review of Stack Exchange’s outage postmortem (linked below) as part of the Operations Incident Board’s Postmortem Report Reviews project. Thanks, Pete!

Chaos Community Day

Next month in Seattle will be the second annual Chaos Community Day, an event full of presentations on chaos engineering. I wish I could attend!

Lives at risk during nationwide weather service meltdown

As the world becomes more and more dependent on the services we administer, outages become more and more likely to put real people in danger. Here’s a rundown of how dangerous last week’s four-hour outage in US’s national weather service was.

What We Don’t Get About Microsoft Azure

An interesting opinion piece that argues that Microsoft Azure is more robust than Google and Amazon’s offerings.

4 Software Quality Lessons From Pokemon Go’s Wild First Week – DZone Performance

This week, I’m trying to catch all the articles being written about Pokémon GO. Here’s one that supposes the problem might be a lack of sufficient testing.

Niantic And Nintendo’s Lack Of Communication About ‘Pokémon GO’ Issues Is Inexcusable

Pokémon GO is blowing up like crazy, and I don’t just mean in popularity. Forbes has a lot to say about the complete lack of communication during and after outages, and we’d do well to listen. This article reads a lot like a recipe for how to communicate well to your userbase about outages.

Netflix Billing Migration to AWS – Part II

Here’s the continuation of last month’s article on Netflix’s billing migration.

Outages

Scalr
Southwest Airlines
British Telecom
- Interestingly, datacenter provider Equinix volunteered that they caused the outage. BT had a second outage the next day.
Stack Exchange
- Linked is a really interesting postmortem about an unexpectedly inefficient regex. Nice work posting this so quickly, Stack Exchange!
Pokemon Go
National Science Foundation (US)
WorldPay

SRE Weekly Issue #32

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

SPONSOR MESSAGE

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues