There’s some great advice in here. My favorite: be explicit about choosing or not choosing to do something.
incident.io
Live video delivery is an intensely reliability-critical endeavor, and Netflix pull back on the curtain on how they tackled it.
Brett Axler, Casper Choffat, and Alo Lowry — Netflix
Java uses memory outside of the heap, so it can OOM in a container even if the heap size is well below the container’s memory limit.
Ramya vani Rayala — DZone
It’s not about obviously wrong stuff — it’s the queries that look good on the surface that can bet you in trouble, per this article. They also share methods to vet LLM-generated SQL.
Readyset
The mental model we use: AI handles the effort so humans can focus on the insight. Not AI instead of thinking.
incident.io
[…] because AI tools continue to make it cheaper to write (and rewrite) code on demand, production environments will become the primary place to evaluate whether software is correct or incorrect.
Peter Farago — RunLLM
The old way: heroes in incident response are an anti-pattern.
The new way: heroes are great and we should make as many heroes as we can.
Hamed Silatani — Uptime Labs
I had to read this one twice before I had my galaxy-brain moment in the second-to-last paragraph.
Lorin Hochstein
