They take us from the requirements analysis all the way through implementation of a high-throughput data store based on CockroachDB.
Chuanpin Zhu and Debalin Das — DoorDash
On March 14th, Reddit engineers upgraded a Kubernetes cluster from 1.23 to 1.24, and all hell broke loose. I admire their precision in being down for 100π minutes.
Jayme Howard — Reddit
With a huge user-base of students and teachers, these folks upped their incident response game, and they share how.
Nadinastiti and Estu Fardani — GovTech Edu
A lurking bug in redis-py allowed users to see one another’s data, and OpenAI took ChatGPT down to limit the damage.
In Linux, source port allocation can be complex. This article shows why with a ton of code and tracing examples.
Jakub Sitnicki — Cloudflare
The gap between “paying for peak” and “earning on average” is critical to understand how the economics of large-scale cloud systems differ from traditional single-tenant systems.
A configuration error was masked because the app automatically fell back to the original configuration. The problem only surfaced when the service was redeployed.