SRE Weekly Issue #444

A good day to trie-hard: saving compute 1% at a time

When you’re doing something 60 million times per second, even a modest optimization makes a huge difference.

Kevin Guthrie — Cloudflare

Pushy to the Limit: Evolving Netflix’s WebSocket proxy for the future

Meet Pushy, Netflix’s websocket-based push system with an impressive five nines of reliability in message delivery.

Karthik Yagna, Baskar Odayarkoil, and Alex Ellis — Netflix

If your early-stage startup can’t afford an observability solution from a vendor, you could try rolling your own. This article has an overview and pointers but stops short of explicit instructions.

Malay Hazarika — Osuite

AI agents invade observability: snake oil or the future of SRE?

With AI SRE “agents” cropping up everywhere, what should we think? Here’s an overview of what’s going on with links to read more.

Clay Smith — Montoring Monitoring

Battle of the RabbitMQ Queues: Classic and Quorum

An overview of the two kinds of RabbitMQ queues along with performance numbers from load tests.

Josephine Eskaline Joyce and Anilkumar Mallakkanavar — DZone

Advancing Our Chef Infrastructure

In this blog post, I’ll discuss the evolution of our Chef infrastructure over the years and the challenges we encountered along the way.

Archie Gunasekara — Slack

How and Why We Made SREBench, SWEBench for Kubernetes

Using LLMs to generate test cases to test an AI agent’s ability to diagnose Kubernetes problems, with a kubectl simulator running on an LLM. Whew, that’s a lot of AI!

Jeffrey Tsaw — Parity

Thoughts From The First SEV0 Conference

I was having some major FOMO last week, so this recap of the SEV0 incident management conference is especially welcome.

Amin Astaneh — Certo Modo

SRE Weekly Issue #444

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, FireHydrant:

Subscribe

RSS

Mastodon

Search Issues