When you’re doing something 60 million times per second, even a modest optimization makes a huge difference.
Kevin Guthrie — Cloudflare
Meet Pushy, Netflix’s websocket-based push system with an impressive five nines of reliability in message delivery.
Karthik Yagna, Baskar Odayarkoil, and Alex Ellis — Netflix
If your early-stage startup can’t afford an observability solution from a vendor, you could try rolling your own. This article has an overview and pointers but stops short of explicit instructions.
Malay Hazarika — Osuite
With AI SRE “agents” cropping up everywhere, what should we think? Here’s an overview of what’s going on with links to read more.
Clay Smith — Montoring Monitoring
An overview of the two kinds of RabbitMQ queues along with performance numbers from load tests.
Josephine Eskaline Joyce and Anilkumar Mallakkanavar — DZone
In this blog post, I’ll discuss the evolution of our Chef infrastructure over the years and the challenges we encountered along the way.
Archie Gunasekara — Slack
Using LLMs to generate test cases to test an AI agent’s ability to diagnose Kubernetes problems, with a kubectl
simulator running on an LLM. Whew, that’s a lot of AI!
Jeffrey Tsaw — Parity
I was having some major FOMO last week, so this recap of the SEV0 incident management conference is especially welcome.
Amin Astaneh — Certo Modo