Operating complex systems is about creating accurate mental models, and abstractions are a key ingredient.
Why is it hard to get an organization to focus on LFI (learning from incidents) rather than RCA (root cause analysis)? Here’s a really great explanation.
It’s about more than just money — like engineer morale, slowed innovation, and lost customers.
Aaron Lober — Blameless
A great primer on the CAP theorem with a real-world example scenario.
It’s really interesting to see how they handled distributed queuing and throttling across a highly distributed cache network without sacrificing speed.
George Thomas — Cloudflare
[…] LLMs are black boxes that produce nondeterministic outputs and cannot be debugged or tested using traditional software engineering techniques. Hooking these black boxes up to production introduces reliability and predictability problems that can be terrifying.
Charity Majors — Honeycomb
Full disclosure: Honeycomb is my employer.
Dig into and understand how enough things work, and eventually you’ll look like a wizard.
Rachel By the Bay
As a rule of thumb, always set timeouts when making network calls. And if you build libraries, always set reasonable default timeouts and make them configurable for your clients.