SRE Weekly Issue #171

A message from our sponsor, VictorOps:

[You’re Invited] Puppet, Splunk and VictorOps are teaming up for a live webinar on powering continuous improvement by combining analytics, incident response and automation. Learn best practices for releasing better applications faster, without the fire drills.

http://try.victorops.com/sreweekly/continuous-improvement-webinar

Articles

TL:DR; Prefer investing in recovery instead of prevention.

Make failure a non-event, rather than trying to prevent it. You won’t succeed in fully preventing failures, and you’ll instead get out of practice of recovering.

Aaron Blohowiak

They had me at “normalization of deviance”. I’ll read pretty much anything with that in the title.

Tim Davies — Fast Jet Performance

Monzo’s system is directly integrated with Slack, helping you manage your incident and track what happens. Check out their video presentation for more details.

Monzo

Me too! Great thread.

Nolan Caudill and others

I love Honeycomb incident reviews, I really do.

Douglas Soo

Born from a Twitter argument thread, this article goes into depth about why Friday change freezes can cause much more trouble than good.

Charity Majors

Outages

Updated: May 5, 2019 — 9:46 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme