SRE Weekly Issue #478

Security and SRE: How Datadog’s combined approach aims to tackle security and reliability challenges

Datadog has fully merged their SRE and Security teams.

In this post, we’ll look at essential elements of SRE and security, the benefits we’ve realized by combining the two disciplines, and what that approach looks like for us.

Bianca Lankford — Datadog

What I Really Mean When I Say “Good Communication” in Incident Response

I love the way this article describes three different audiences for your communication during incidents. It describes what each audience is looking for and gives both positive and negative examples of how to communicate with them.

Hamed Silatani — Uptime Labs

Load testing: Prepare for the growth you dream of!

My favorite part of this article is the section on where to run your load tests: production, staging, or something else?

Tom Elliot

Working on Complex Systems: What I Learned Working at Google

What is complexity? This article gives a clear definition and breaks down the qualities one can find in a complex system. Then it goes over various methods of dealing with that complexity.

Teiva Harsanyi — The Coder Cafe

QUIC restarts, slow problems: udpgrm to the rescue

Cloudflare has a history of doing some pretty interesting things with sockets in Linux — and taking us along for the journey with highly-detailed explanations. This article is no exception, sharing the unique challenges encountered when restarting processes that handle UDP streams.

Marek Majkowski

Do not deploy on Friday!

This article examines the standard friday deploy prohibition and ultimately pushes back.

Ok… but why not?

Adrien Guéret — OpenClassrooms

Google SREs are changing the game again: a breakdown of their new approach

This article introduces the STAMP (System-Theoretic Accident Model and Processes) framework being adopted at Google, after first explaining the shortcomings in traditional SRE practices that prompted Google to adopt STAMP.

Jorge Lainfiesta — Rootly

Labeling a root cause is predicting the future, poorly

I really love this framing of what’s wrong with picking a single root cause.

Lorin Hochstein

SRE Weekly Issue #478

Subscribe

RSS

Mastodon

Search Issues