A bit of a short issue this week, as I spent most of my weekend at my child’s first First Robotics Competition of the season. FRC truly is a microcosm of reliability engineering, balancing limited time and resources while trying to produce the most reliable bot possible.
Just because Google, Amazon, or Facebook does it doesn’t mean you should. Here are four ‘best practices’ of the hyperscalers you have permission to ignore.
Matt Asay — InfoWorld
An introduction to distributed transactions using the Saga pattern, including pros and cons and two approaches for implementing sagas.
Sid — Scalable Thread
Here’s an argument against real-world “war rooms” for incident response, including a great incident story as an example.
I can’t imagine doing that kind of multi-window parallel investigation stuff on a teeny little laptop screen with people right next to me on either side
rachelbythebay
This one includes a list of responsibilities a lead incident responder has and another list of things they should delegate.
Incident lead isn’t an extra job that you do “on top of” engineering. It’s the main job.
r/devoopseng — Reddit r/sre
Scaling Elasticsearch requires balancing sharding, query performance, and memory tuning for optimal efficiency in high-traffic, real-time applications.
Vivek Kumar — DZone