SRE Weekly Issue #142

SPONSOR MESSAGE

Becoming a reliability engineer takes a unique set of skills and a breadth of knowledge. See what it takes to become an SRE, and use this as a resource to quickly ramp-up new SREs:

http://try.victorops.com/sreweekly/becoming-a-reliability-engineer

Articles

The big news this week is the story from Bloomberg alleging a spy chip on SuperMicro motherboards. I say “alleging” because Amazon and Apple have issued unequivocal denials.

Jordan Robertson and Michael Riley — Bloomberg

There was a plan in the works in the months before the Pulse nightclub mass shooting in Florida (US) in 2016, designed for getting victims out of a “hot” zone. The story about why it wasn’t implemented echoes the kind of organizational failings we see as SREs.

Abe Aboraya — ProPublica

Facebook is at it again! Here’s a new system based on a state machine driven by Chef.

Declan Ryan — Facebook

Google has produced a new guide on designing DR in Google Cloud Platform:

We’ve put together a detailed guide to help steer you through setting up a DR plan. We heard your feedback on previous versions of these DR articles and now have an updated four-part series to help you design and implement your DR plans.

Grace Mollison — Google

[…] you must be part of the team working on the system. You cannot be someone that hurts a system and then wait for others to fix the problem.

Jan Stenberg — InfoQ

If you’ve ever been woken in the middle of the night just to see that an alert could be solved by adding another server or two to the loadbalancer, you need capacity plans and you need them yesterday.

Evan Smith — Hosted Graphite

[…] our industry has finally reached the tipping point at which it has become viable to build distributed systems from scratch, at a fast pace of iteration and low cost of operation, all while still having a small team to execute

The author argues that it’s possible to avoid building tech debt while still retaining the velocity a new startup needs.

Author: Santiago Suarez Ordoñez — Blameless, Inc.

From a single host, to a bigger host, to leader/follower replication and active/active setups. The distinction between active/active versus “Multi-Active” is worth reading.

Sean Loiselle — Cockroach Labs

Outages

Updated: October 7, 2018 — 8:38 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme