Happy BTW: Wear a mask.
LaunchDarkly started off with a polling-based architecture and ultimately migrated to pushing deltas out to clients.
Dawn Parzych — LaunchDarkly
A brief overview of some problems with distributed tracing, along with a suggestion of another way involving AI.
Larry Lancaster — Zebrium
This is Google’s post-incident report for their Google Classroom incident on July 7.
Uber has long been a champion of microservices. Now, with several years of experience, they share the lessons they’ve learned and how they deal with some of the pitfalls.
Adam Gluck — Uber
This article opens with an interesting description of what the Cloudflare outage looked like from PagerDuty’s perspective.
Dave Bresci — PagerDuty
This post reflects on two distinct philosophies of safety:
the engineering design should ensure that the system is safe
design alone cannot ensure that the system is safe
You can’t use availability metrics to inform you about whether your system is reliable enough, because they can only tell you if you have a problem.