Pinterest decided to replace their Hadoop+Spark-based data processing pipeline with one based on Kubernetes.
In part one, we provide rationale for our new technical direction prior to outlining the overall design and detailing the application focused layer of our platform. We conclude with current status and some of our learnings.
Soam Acharya, Rainie Li., William Tom, and Ang Zhang — Pinterest
This article raises some important concerns that are worth thinking about.
It’s fast and feels efficient, but it masks a drop in codebase familiarity. Over time, your top engineers stop being system experts.
Alexander Procter — Okoone
I really love the care taken in this article to consider the potential risks of AI tools for incident response. There are many valuable insights that make this article way more than just a sales pitch for their tool.
Chris Evans — incident.io
Quicksilver a globally distributed key-value store serving billions of requests per second where speed is critical, so you know the scaling challenges are going to be interesting.
Marten van de Sanden and Anton Dort-Golts — Cloudflare
This article gives reproducible cases in which MySQL and Postgres can reuse auto-increment IDs.
I think I’ve seen this advice violated at nearly every company I’ve worked at:
Best practice dictates that you shouldn’t be using IDs from database tables outside of that table unless it’s some foreign key field
Sam Rose
Here’s a great explanation of why it’s often better to use for_each
instead of count
in Terraform.
Ned Bellavance
This debugging story really drew me in. It’s so incredibly satisfying the way their initial theory was confirmed so tidily in the end.
Nayef Ghattas — Datadog
In our latest Rootly roundtable, we sat down with a group of seasoned SREs (collectively packing over 100 years of ops scars) to trade notes on what makes an alert useful, what makes it noise, and how to build alerting systems that teams can trust.
Here are their top strategies distilled for you:
Jorge Lainfiesta — Rootly