Are there any blind or low-vision readers out there that would be willing to answer a few questions? I’m looking to learn more about your experience of reading a newsletter like this and the articles I link to. If you’re interested, please drop me an email at lex at sreweekly dot com. Thanks!
This article shows how to use timed_rotating
and multirotate_set
to regularly rotate credentials using Terraform.
Andy Leap — Mixpanel
After an incident involving a database schema change, this engineer created a linting system for schema changes to catch painful ones that would cause a full table rewrite.
Fred Hebert — Honeycomb
Full disclosure: Honeycomb is my employer.
Finding Heroku and alternative services lacking for various reasons, these folks built their own Heroku-like platform on top of Kubernetes and migrated their service to it.
Matheus Lichtnow — WorkOS
It’s anything but simple to handle IPv4 and IPv6 in your service. This article covers the nitty-gritty details including dual-stack resolvers and Happy Eyeballs.
Viacheslav Biriukov
What’s great about an incident? It helps uncover latent flaws in your system, as happened to these folks during a Redis upgrade.
Shayon Mukherjee
Tips on how to handle vendor incidents, from runbooks to incident management and post-incident review.
Mandi Walls — PagerDuty
Cool trick:
[…] when an operational surprise happens, someone will remember “Oh yeah, I remember reading about something like this when incident XYZ happened”, and then they can go look up the incident writeup to incident XYZ and see the details that they need to help them respond.
Lorin Hochstein
While the CAP theorem may be technically correct, the actual limitations it imposes on real-world systems have nuance.
The reality is that CAP is nearly irrelevant for almost all engineers building cloud-style distributed systems, and applications on the cloud.
Marc Brooker