My sincerest apologies to Dale Markowitz, the author of this article who I mispronouned in last week’s issue. I’m kicking myself, because I totally didn’t need to use a pronoun at all.
Dale Markowitz — LOGIC Magazine
Linus Torvalds made waves this week with an email apologizing for his unprofessional behavior and committing to improving.
A pretty detailed article on how LaunchDarkly designed their system for reliability. The streaming vs. polling section is especially interesting.
Adam Zimman — LaunchDarkly
Full disclosure: Fastly, my employer, is mentioned.
Lots of details about how they achieve their reliability goals. I’d love to see a followup with more detail on why writing a solution in-house made sense versus adopting something like Kafka.
Mark Marchukov — Facebook
The staging environment plays an important part. If staging isn’t working for your organization, make sure you aren’t making these common mistakes.
Harshit Paul — DZone
The challenges in question involve testing a microservice’s interactions with other microservices. Read about their system for distributing and running mock servers for each microservice.
Mayank Gupta, K.Vineet Nair, Shivkumar Krishnan, Thuy Nguyen, and Vishal Prakash — Grab
My partner suggested I look into the Deepwater Horizon incident, and I’m glad I did. My two key takeaways were normalization of deviance and this gem:
Researchers who study disasters tell us that a long period without an accident can be a big risk factor in itself: Workers learn to expect safe operation as the norm and can’t even conceive of a devastating failure.
James B. Meigs — Slate