SRE Weekly Issue #523

A message from our sponsor, Buildkite:

More places to run, more scale to manage and maintain, usually means more blind spots; not here. Buildkite’s control plane holds the live state of every job, agent and queue, regardless of throughput size.

See what’s running, what’s waiting and why with immediate insight → https://buildkite.com/platform/pipelines/

This week, I passed on a couple of articles for the same reason: they contained images with significant text content and no alt text. I don’t always entirely skip such articles, but in this case, the content was relevant enough that I didn’t want to leave folks with screen readers behind.

I have sight, but missing alt text does cause me to stumble even still. I read the vast majority of articles for the newsletter via text-to-speech. It can be really jarring and confusing when I miss an important thread of an article because it’s in an image. I can stop and take a look, but this can be a great forcing function to remember that others may not be able to.

While I’m here, a quick addendum to last week’s issue: I failed to attribute the AWS article to its author, Harshvardhan Chunawala. Sorry, Harshvardhan!

Oh, I’ve definitely felt that pull to debug as an IC. Gotta either hand over the IC reins or, as this article recommends, find a good tech lead.

  Brent Chapman

If your three data types can’t be joined programmatically today, an AI layer on top won’t fix that; it’ll just be confused faster.

  Pruthvi Raj Seknametla — HackerNoon

In this article, we’ve compiled a selection of tips we wish we had known the first time we picked up the pager or bore the BlackBerry.

  Uptime Labs

Me too. I do so much of my learning from an incident while I’m trying to write about it.

  Lorin Hochstein

The level of candor in this one is commendable. By all rights the maintenance itself went well — the incident was in the communication leading up to it.

  Fred Hebert — Honeycomb

This deep debugging story has a satisfying ending, and I can really feel the level of effort and detective work it took to get there.

   Deanna Lam, Diretnan Domnan, and Matt Lewis

How we made custom instrumentation blazing fast, simple, and data-centric

The answer was not just to throw AI at it.

  Jean-Mark Wright

We were curious whether AI could help us safely evolve a critical production system. This post is about what worked, what didn’t, and what we learned along the way.

I like their approach: AI is a tool only; powerful but not the whole solution.

  Arnold Wakim — Datadog

Updated: June 28, 2026 — 9:43 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme