As a television broadcaster, how do I ensure that my channels are playing out the right thing for my viewers?
This is SRE applied to tv broadcasting: they replaced human monitoring of screens with an automated system.
Jeremy Blythe — evertz.io
Full disclosure: Honeycomb, my employer, is mentioned.
An interview with an engineer about on-call practices, training folks for on-call, and chaos engineering.
Elena Boroda — Fiberplane
SRE: totally defined. Time for a reorg, and with a catchy tune!
Great advice for incident response, backed up by real-world anecdotes.
Audrey Simonne — DZone
There’s a lot to learn from in this air accident. A chilling example: several quirks of the plane’s automation combined to effectively tell the pilot to continue pushing the plane to stall.
When sharding a database, if transactions can span shards, then it can be very difficult to reason about the system’s maximum throughput.
For example, splitting a single-node database in half could lead to worse performance than the original system.
Through Ubuntu’s unattended-upgrades system, a systemd update was installed that broke systemd-resolved, which in turn broke GitHub Codespaces. The systemd bug report they link to is also well worth a read.
Jakub Oleksy — GitHub
we’re, unfortunately, too good at explaining away failures without making any changes to our priors.