So I bombed an incident review this week. More specifically, the facilitating.
I love how candid this article is. This kind of story is invaluable to level up our own retrospective facilitation skills.
Will Gallego
It turns out that Google Cloud has a distributed tracing offering, and here’s an example of how to set it up.
Punit Sethi
This article explains how 8 popular database systems use synchronized clocks. The systems covered include Spanner, DynamoDB, CockroachDB, and others.
Murat
This article introduces the concept of a hot shard in a distributed system and outlines several strategies for alleviating it.
Sid
Leap seconds can be really dangerous for IT systems! This article explains how the author eased their infrastructure through a leap second by smearing its effect across the preceding day.
rachelbythebay
This article series revisits the underpinnings of the shift toward microservices, with a critical eye. My favorite bit is the analogy for microservice complexity in part 3.
Uwe Friedrichsen
Catchpoint is back with their seventh annual SRE report, and you can download the PDF directly without having to register.
Catchpoint
There are some real gems in here, including my favorite, death by yes
.