In this blog post, we describe the journey DoorDash took using a service mesh to realize data transfer cost savings without sacrificing service quality.
Hochuen Wong and Levon Stepanian — DoorDash
When just a few “regulars” are called in to handle every incident, you’ve got a knowledge gap to fill in your organization.
David Ridge — PagerDuty
Dropbox expands into new datacenters often, so they have a streamlined and detailed process for choosing datacenter vendors.
Edward del Rio — Dropbox
This is either nine things that could derail your SRE program, or a list of things to do with “not” in front of them — either way, it’s a good list.
Shyam Venkat
We need enough alerting in our systems that we can detect lurking anomalies, but not so much that we get alert fatigue.
Dennis Henry
A post about the importance of product in SRE, and how to make product and SRE first-class citizens in your Software Development Lifecycle.
Jamie Allen
A relatively minor incident took a turn for the worse after the pilots attempted a close fly-by in an attempt to resolve it. I swear I’ve been in this kind of incident before, where I took risks significantly out of proportion to the problem I was trying to solve.
Kyra Dempsey (Admiral Cloudberg)