SRE Weekly Issue #257

A message from our sponsor, StackHawk:

Keeping your APIs secure requires thoughtful design and testing. Learn how to protect your REST, SOAP and GraphQL APIs from security vulnerabilities with StackHawk
http://sthwk.com/api-protection

Articles

This one really got me thinking. Make sure you document why an alert exists, not just what it checks for.

Chris Siebenmann

If you start with a monolith and adopt a microservice architecture, your incident response process will need to change as well.

Mya Pitzeruse — effx

Another one that needs a disclaimer: there’s no single “root cause” for an incident, and this article is not about that. This is about using statistical software to aid humans in debugging by looking at the activities performed by different users before they encounter a given bug.

Vijay Murali, Edward Yao, Umang Mathur, Satish Chandra — Facebook

A new SRE at Honeycomb shares insight on the job and SRE attitudes in general.

Fred Hebert — Honeycomb

This post considers the January 4th Slack outage as a set of cases of saturation.

Lorin Hochstein

Outages

Updated: February 14, 2021 — 8:39 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme