Articles
The Robustness Principle (“be conservative in what you send, and liberal in what you accept”) has its uses, but it may not be best for the development of mature protocols, according to this IETF draft.
Martin Thomson
Docker without Kubernetes, does it make sense? These folks have a well-reasoned argument explaining why Kubernetes is not for them.
Maik Zumstrull — Ably
Can a service outage unrelated to security count as a “personal data breach” in terms of GDPR and other regulations? If you agree with this article’s logic, then maybe it can.
Neil Brown
The interactions between security and reliability incidents can be complex and hard to navigate. The example scenarios in this article really made me think.
Quentin Rousseau — Rootly
To deal with thundering herds, reddit implements caching in front of each of its microservices.
Raj Shah — reddit
Incident causes are a social construct, and it may be that your organizational structure prevents something from being counted as a cause.
Lorin Hochstein
Check it out, Dropbox publicly released their SRE career ladder.
Dropbox
There’s a moment halfway through this episode of Page It to the Limit where they talk about blamelessness. If you just tell people to “do blameless postmortems”, but you don’t tell them how, then they’ll be afraid to talk about anything people did, and that will hamper learning.
Julie Gunderson, with guestTim Nicholas — Page It to the Limit
This was a monumental task, considering the 1000+(!!) internal code patches they had to port from MySQL 5.6 to 8.0.
Herman Lee, Pradeep Nayak — Facebook
Outages
- Akamai
- Akamai had what they’re calling an “Edge DNS Service Incident”. It made headlines this week because it took down many of their customers, similar to the Akamai incident last month.
- Let’s Encrypt
- Disney park-related apps
- Heroku