SRE Weekly is back! My partner is doing well, and thanks for all the kind words and well-wishes.
There’s a lot you miss out on if you get an LLM to write your incident review.
incident reviews are fundamentally a socio-technical process, and they do not provide benefit if people don’t engage with them.
Fischer
I love this concept of reliability debt.
Spiros Economakis
This one starts with an insightful comparison of two commercial aviation incidents and the crew’s actions. It goes on to draw broader lessons that we can use as SREs.
Hamed Silatani — Uptime Labs
What happens now that SQL is being written by LLMs? I love the analogy to the advent of ORMs that abstracted away the generation of SQL.
Tanmay Sinha — Readyset
What specific kind of bugs is AI more likely to generate? Do some categories of bugs show up more often? How severe are they? How is this impacting production environments?
They did a survey of 470 codebases and share the numbers on the rate of bugs generated by LLMs versus humans.
David Loker — CodeRabbit
This post looks at ten real status page examples from teams that have dealt with outages at scale. Each example highlights what they communicate well, where they set expectations clearly, and how small details reduce confusion during incidents.
Laura Clayton — UptimeRobot
If you don’t explicitly state your expected level of reliability, your customers will infer one and hold you to it anyway. “Disappoint” them early by telling them what to expect.
Dave O’Connor
Humans exhibit variation in how we respond to a given situation, and this article argues that it’s one of our strengths. LLMs intentionally also exhibit variability.
Lorin Hochstein
