SRE Weekly Issue #259

A message from our sponsor, StackHawk:

Mark your calendars! The first conference for OWASP ZAP users is taking place March 9. Get your free ticket to connect with other ZAP users and learn about the project’s roadmap
http://sthwk.com/zapcon-sreweekly

Articles

This quarter’s Increment issue is about Reliability, and I haven’t had this much fun since their first issue about on-call. I’ll include a few of the articles here and more in later issues as I have a chance to review them.

Stripe

Accepting that imperfect things still work is fundamental to preventing failures from becoming catastrophes.

Understanding that no system is without errors is critical to building resilient systems.

Heidi Waterhouse

The very first sentence sets the tone, and I love it:

Resilience is a process: something you must actively perform, not something you check off a list once.

Ryn Daniels

Most of all, having an incident commander only works if everyone believes in the role. Someone stepping in to address a crisis and saying “I’m Batman” doesn’t help unless people have bought into the idea of Batman.

The next time I’m incident commander, I am totally going to jump in and say, “I’m Batman!”.

This article is a great primer on what an IC is and how to adopt incident command at your organization.

Tanya Reilly

After reading this blog post, you will have an understanding of the retry pattern used in microservices architecture, why it should be used, a few considerations while using the retry pattern, and how to use it in Python.

I love the W. C. Fields quote.

Anand Prashant

It’s that time again! Be sure to fill out the survey, not only so they can gather useful data, but also because Catchpoint will donate $5 to charity.

DevOps Institute, Catchpoint, and VMWare Tanzu

When considering the value of a QA test, SLIs can provide very valuable context.

SRE and QA can work hand in hand.

Emily Arnott — Blameless

This kind of thing keeps me up at night. Silent data corruption can destroy your reliability just as quickly as a backhoe on a non-redundant link.

Harish Dattatraya Dixit — Facebook

Etsy experienced years of growth practically overnight in 2020 as quarantines set in. Here’s how they handled it.

Mike Adler — Etsy

Outages

Updated: February 28, 2021 — 8:56 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme