SRE Weekly Issue #176

View on sreweekly.com

Articles

Distributed Tracing — we’ve been doing it wrong

[…] spans are too low-level to meaningfully be able to unearth the most valuable insights from trace data.

Find out why current distributed tracing tools fall short and the author’s vision of the future of distributed tracing.

Cindy Sridharan

Why Every Company Can Benefit from a Blameless Culture

If I wanted to introduce the concept of blameless culture to execs, this article would be a great starting point.

Rui Su — Blameless

The Multiple Audiences and Purposes of Post-Incident Reviews

When we look closely at post-incident artifacts, we find that they can serve a number of different purposes for different audiences.

John Allspaw — Adaptive Capacity Labs

This major internet routing blunder took A WEEK to fix. Why so long? It was IPv6 – and no one really noticed

When you meant to type /127 but entered /12 instead

Oops?

Automating chaos experiments in production

The early failure injection testing mechanisms from Chaos Monkey and friends were like acts of random vandalism. Monocle is more of an intelligent probing, seeking out any weakness a service may have.

There’s a great example of Monocle discovering a mismatched timeout between client and server and targeting it for a test.

Adrian Colyer (summary)

Basiri et al., ICSE 2019 (original paper)

The Configuration Complexity Clock

Take the axiom of “don’t hardcode values” to an extreme, and you end up right back where you started.

Mike Hadlow

Outages

Cloudflare
- Cloudflare suffered a massive outage, returning 502 responses for over 80% of traffic for over 20 minutes. Linked above is their analysis. A tweet thread involving their CEO is also illuminating.
Instagram
Twitter
Google Maps
iCloud
Tweetdeck
Azure
- Azure suffered an outage in San Jose, CA, USA on July 2.

SRE Weekly Issue #176

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, VictorOps:

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues