SRE Weekly Issue #157

A message from our sponsor, VictorOps:

See how VictorOps built their SRE efforts from scratch and structured SRE operations across a smaller team. Developing a culture of collaboration and accountability takes time and effort – but it makes all the difference:

http://try.victorops.com/sreweekly/building-a-culture-of-sre

Articles

Best article about post-incident investigations that I’ve seen in awhile. My favorite part is the recommendation not to use a template for the retrospective, as it will artificially narrow the scope of the investigation.

Ryan Frantz

These folks have set up a survey to gather information on whether and how folks are compensated for on-call in IT. This topic has been gaining traction over the past couple of years, and I can’t wait to see the results of the survey. Please take a moment to fill it out.

Chris Evans and Spike Lindsey

I’ll be speaking at SRECon19 Americas this March with my former coworker, Courtney Eckhardt. The talk lineup looks incredible and I’m really excited to go!

If you’re going to be there, drop me an email (I’m terrible at Twitter) and let me know. I’ll have lots of swag available, made with 100% open source software (Ink/Stitch and inkscape-silhouette).

Especially useful for folks new to on-call.

If you only take one thing away from this post, it’s that you need to put your own well-being first, and once you do that other aspects of on-call will become easier.

Dave Fennell — Hosted Graphite

I have to admit I wasn’t clear on two-phase commit before I read this. Now I know what it’s all about — and its drawbacks.

Daniel Abadi

This guide from Google describes the qualities and practices of SRE teams of various levels from beginner to advanced.

Gustavo Franco — Google

A good intro if you’re new around here.

Sylvia Fronczak — Scalyr

Outages

Updated: January 27, 2019 — 8:18 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme