SRE Weekly Issue #159

A huge thanks to my awesome former coworker Greg Burek whose helpful link contributions make up fully half of this issue.  Thanks, Greg!

A message from our sponsor, VictorOps:

Are you an SRE working with Microsoft Azure? Learn more about the key services offered in Azure and how SRE teams can leverage these tools and applications to build and deploy reliable services at a consistent pace:

http://try.victorops.com/sreweekly/microsoft-azure

Articles

This paper discusses the ways in which automation of industrial processes may expand rather than eliminate problems with the human operator.

My favorite bit of irony: presenting data to the user in the manner most readily understood results in lower likelihood of remembering the data, so perhaps the most easily grasped display is not actually the best!

Lisanne Bainbridge

Like malice and incompetence, laziness should be far off our radar when we investigate an incident. I hope that reading this article opens minds about the true scope of blamelessness.

Devon Price

Whether or not you agree with this particular attempt at defining what a Systems Engineer (or SRE or anything related) is, it’s worth thinking about and discussing. Our field is evolving quickly, and titles are a moving target.

Matt Ouille

Driven by a desire to update their 737 without causing airlines to have to retrain pilots, Boeing seemingly kept pilots in the dark about what may have been an important little detail of how the new 737 Max operates, with a tragic result.

James Glanz, Julie Creswell, Thomas Kaplan and Zach Wichter — New York Times

An experienced SRE will develop an innate skepticism of new technologies, even if they don’t realize it. This article provides an excellent list of questions to help articulate that skepticism when evaluating a potential design.

Kellan Elliott-McCrea

Auto-scaling isn’t all roses. Like any tool, you have to understand how it works in order to avoid the pitfalls. Read this article to learn what these folks learned the hard way.

Tyson Mote — Segment

Transitioning to a blameless culture can be difficult, especially as folks might blame each other for forgetting to be blameless!

Rachael Byrne — PagerDuty

Many of the old arguments for not instrumenting code (mostly about performance) no longer apply, and a host of new arguments push toward structured events.

Charity Majors

Outages

Updated: February 10, 2019 — 8:34 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme