SRE Weekly Issue #99

Lots of outages this week, although not as many as in some previous years on Black Friday.  We’ll see what Cyber Monday brings.

I’m writing this from the airport on my way to re:Invent.  Perhaps I’ll see some of you there as I rush about from meeting to meeting.


Attending AWS re:Invent 2017? Visit the VictorOps booth, schedule a meeting, or join us for some after hours fun. See you in Vegas!


Complete with a nifty flow-chart for informed decision-making.

As the title suggests, this article by New Relic is about the mindset of an SRE. I really love number 3, where they discuss the idea that gating production deploys can actually reduce reliability rather than improve it.

It’s what it says on the tin, and it’s targeted for DigitalOcean. One could also use this as a general primer on setting up HeartBeat failover using other cloud platforms.

The Chaos Toolkit is a free, open source project that enables you to create and apply Chaos Experiments to various types of infrastructure, platforms and applications.

It currently supports Kubernetes and Spring.

Here’s a neat little overview of the temporary but massive network that joins the re:Invent venues up and down the Las Vegas strip. Half of the strip is also set up for Direct Connect to the nearest AWS region.

The three pitfalls discussed are confusing EBS latency, idle EC2 instances wasting money, and memory leaks. My favorite gotcha isn’t mentioned: performance cliffs caused by running out of burst in T2 instances or GP2 volumes.


Updated: November 26, 2017 — 2:37 pm
SRE WEEKLY © 2015 Frontier Theme