Great practical advice for how to present reliability problems (and your proposed solutions) to e-staff.
Ross Brodbeck
It’s when things aren’t always on fire that it can be very difficult to assess whether we need to allocate additional resources to reduce risk.
Lorin Hochstein
The three kinds of roles covered in this article relate to Standards, Operations, and Leadership.
Gavin Cahill — Gremlin
Nagle’s algorithm considered harmful? It’s important to be aware of it because it can trip you up.
Marc Brooker
In issue #423, I linked to a story about Amazon charging for unauthenticated and failed requests to S3 buckets. Thankfully, they’re no longer charging for that.
Amazon
A little low on details, but interesting nonetheless: Google Cloud did something weird and accidentally deleted a customer’s account out from under them.
UniSuper
What is a “service” in the context of service levels (SLI/SLO)?
Alex Ewerlöf
My favorite part of this one is the description of techniques for improving psychological safety at your company.
Incident.io
SRE Weekly, a production of Tinker Tinker Tinker, LLC · {{Sender_Address}} · {{Sender_City}}, {{Sender_State}} {{Sender_Zip}}