If you really want to understand how complex systems fail, you need to think in terms of webs rather than chains.
We asked members of the PagerDuty Community what they do to remove the fear of being on-call and also asked them to share a piece of advice for those starting out on the on-call rotation and here are some of their insightful tips!
There’s some interesting advice in here that I haven’t heard before, like rerunning the incident review meeting if you don’t get enough out of it the first time. Have any of you ever done this?
Catchpint’s annual SRE report is out, and you can download the PDF without even having to fill out a form.
The cool thing about this article is the discussions of anti-patterns to avoid, sprinkled throughout.
Vanessa Huerta Granda — InfoQ
I cover GCP and AWS here a lot, so now it’s Azure’s turn, with this detailed guide on load balancing.
Shivaprasad Sankesha Narayana — DZone
Read this one to learn how Cloudflare implemented a reliable logging pipeline with 1 million log lines per second.
Colin Douch — Cloudflare