SRE Weekly Issue #220

A message from our sponsor, StackHawk:

Hi, SRE Weekly. We’re your new newsletter sponsor, StackHawk. We believe that application security is an important part of reliability engineering, and we’re building tooling to support that. We’d love for you to check us out.
https://www.stackhawk.com?utm_source=SREWeekly

Articles

Catchpoint is holding a mini-conference on the ways that SRE has changed as we shift to all-remote work, and I’m super-excited to be on the Q&A panel! Hope to see you there.

Catchpoint

A seasoned pro discusses some pitfalls of cloud-based architecture based on hard-won experience.

Rachel by the bay

Monzo is back with updates on how their on-call has changed since their original article in 2018.

Shubheksha Jalan — Monzo

Along with this rockin’ article about why it’s important to make on-call bearable, Incident Labs also has a survey on your on-call experience. Click through for the link.

Incident Labs

This really crystallizes a lot of my concerns with anomaly detection.

Danyel Fisher — The New Stack / Honeycomb

If you ask someone why they did something, they’re likely to invent a logical-sounding reason without meaning to.

Lorin Hochstein

Outages

Updated: May 24, 2020 — 9:27 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme