SRE Weekly Issue #407

A message from our sponsor, FireHydrant:

Signals is now available in beta. Sign up to experience alerting for modern DevOps teams: Page teams, not services. Ingest inputs from any source. Bucket pricing based on usage. And one platform — ring to retro — finally.

If you really want to understand how complex systems fail, you need to think in terms of webs rather than chains.

  Lorin Hochstein

We asked members of the PagerDuty Community what they do to remove the fear of being on-call and also asked them to share a piece of advice for those starting out on the on-call rotation and here are some of their insightful tips!

  Xenda Amici

There’s some interesting advice in here that I haven’t heard before, like rerunning the incident review meeting if you don’t get enough out of it the first time. Have any of you ever done this?

  Jonathan Word

Catchpint’s annual SRE report is out, and you can download the PDF without even having to fill out a form.


The cool thing about this article is the discussions of anti-patterns to avoid, sprinkled throughout.

  Vanessa Huerta Granda — InfoQ

I cover GCP and AWS here a lot, so now it’s Azure’s turn, with this detailed guide on load balancing.

  Shivaprasad Sankesha Narayana — DZone

Read this one to learn how Cloudflare implemented a reliable logging pipeline with 1 million log lines per second.

  Colin Douch — Cloudflare

Updated: January 14, 2024 — 9:25 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme