SRE Weekly Issue #487

Next Gen Data Processing at Massive Scale At Pinterest With Moka (Part 1 of 2)

Pinterest decided to replace their Hadoop+Spark-based data processing pipeline with one based on Kubernetes.

In part one, we provide rationale for our new technical direction prior to outlining the overall design and detailing the application focused layer of our platform. We conclude with current status and some of our learnings.

Soam Acharya, Rainie Li., William Tom, and Ang Zhang — Pinterest

How AI-generated code is quietly increasing system risk

This article raises some important concerns that are worth thinking about.

It’s fast and feels efficient, but it masks a drop in codebase familiarity. Over time, your top engineers stop being system experts.

Alexander Procter — Okoone

Avoiding the ironies of automation

I really love the care taken in this article to consider the potential risks of AI tools for incident response. There are many valuable insights that make this article way more than just a sales pitch for their tool.

Chris Evans — incident.io

Quicksilver v2: evolution of a globally distributed key-value store (Part 2)

Quicksilver a globally distributed key-value store serving billions of requests per second where speed is critical, so you know the scaling challenges are going to be interesting.

Marten van de Sanden and Anton Dort-Golts — Cloudflare

Practical Problems with Auto-Increment

This article gives reproducible cases in which MySQL and Postgres can reuse auto-increment IDs.

I think I’ve seen this advice violated at nearly every company I’ve worked at:

Best practice dictates that you shouldn’t be using IDs from database tables outside of that table unless it’s some foreign key field

Sam Rose

Choosing Between Count and For-Each

Here’s a great explanation of why it’s often better to use for_each instead of count in Terraform.

Ned Bellavance

How we tracked down a Go 1.24 memory regression across hundreds of pods

This debugging story really drew me in. It’s so incredibly satisfying the way their initial theory was confirmed so tidily in the end.

Nayef Ghattas — Datadog

The Art of Not Getting Woken Up for Nothing

In our latest Rootly roundtable, we sat down with a group of seasoned SREs (collectively packing over 100 years of ops scars) to trade notes on what makes an alert useful, what makes it noise, and how to build alerting systems that teams can trust.

Here are their top strategies distilled for you:

Jorge Lainfiesta — Rootly

SRE Weekly Issue #487

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, Spacelift:

Subscribe

RSS

Mastodon

Search Issues