Articles
In case you weren’t familiar with the Saga pattern like I was, it’s basically a pseudo-transaction across multiple microservices. Here’s why it might not be a great idea.
Sergiy Yevtushenko
During a rolling deploy, for a very brief period of time, different parts of the infrastructure had old or new code running, with unexpected results.
Andrew Ayer
On its face, we have a simple requirement:
- Generate sequential numbers
- Ensure that there can be no gaps
- Do that in a distributed manner
It’s never simple with distributed systems.
In classic Cloudflare style, here’s an ultra-deep dive into the kernel to find the source of trouble-making packet loss.
Terin Stock — Cloudflare
Even with a “duplicate” incident, there’s always at least one thing that’s different: the fact that it’s happened before. That changes things. In practice, a lot more will be different too.
Fred Hebert — Honeycomb
Full disclosure: Honeycomb is my employer.
There are definitely pros and cons to being in the most popular (and most oft-maligned) AWS region.
Jeff Martens — Metrist
Changes are frequent causes of incidents, but what exactly counts as a change? This article delves into that with examples.
Boris Cherkasky
This crash is a great reminder that we have to look past “human error” to the systems around the humans that set them up for failure (or don’t set them up for success).
Admiral Cloudberg