SRE Weekly Issue #262

A message from our sponsor, StackHawk:

Join the Secure Coding Summit to hear from industry-leading AppSec and DevSecOps practitioners, analysts, and visionaries as they share their best pro tips to level up your code security.
http://sthwk.com/secure-code-summit

Articles

Chaos Engineering isn’t adding chaos to your systems—it’s seeing the chaos that already exists in your systems.

Along with four prerequisites, this article also includes 3 myths about chaos engineering that might be making you feel hesitant about starting.

Courtney Nash — Verica

This one’s from May of last year. Almost a year on, it’s interesting to see which of these we’ve already implemented.

Ashley Roof — Transposit

An amusing parable illustrating why not to try to be too reliable.

Andrew Ford — Indeed

In the Outages section of last week’s issue, you’ll find two unrelated events referenced in this article: one about Russian internet censorship gone awry and another about a major datacenter fire.

Eric Johansson — Verdict

Along with what’s in the title, this article also covers the difference between an RCA and a contributing factors analysis.

Emily Arnott — Blameless

Lots of detail on how LinkedIn is improving their traffic forecasts. Warning/enticement: math contained within.

Deepanshu Mehndiratta — LinkedIn

Everyone is testing in production, some organizations admit and plan for it.

How to do it right, what can happen if it goes wrong, and how to limit the blast radius.

Heidi Waterhouse — LaunchDarkly

Remember when GitHub logged you out? Ah, I remember it like it was last week. I mean, the week before. Here’s GitHub’s troubleshooting story about what went wrong.

Dirkjan Bussink — GitHub

Outages

Updated: March 21, 2021 — 9:26 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme