SRE Weekly Issue #466

View on sreweekly.com

A bit of a short issue this week, as I spent most of my weekend at my child’s first First Robotics Competition of the season. FRC truly is a microcosm of reliability engineering, balancing limited time and resources while trying to produce the most reliable bot possible.

No, you don’t have to run like Google

Just because Google, Amazon, or Facebook does it doesn’t mean you should. Here are four ‘best practices’ of the hyperscalers you have permission to ignore.

Matt Asay — InfoWorld

What is Saga Pattern in Distributed Systems?

An introduction to distributed transactions using the Saga pattern, including pros and cons and two approaches for implementing sagas.

Sid — Scalable Thread

Answering reader feedback: war rooms vs. deep investigations

Here’s an argument against real-world “war rooms” for incident response, including a great incident story as an example.

I can’t imagine doing that kind of multi-window parallel investigation stuff on a teeny little laptop screen with people right next to me on either side

rachelbythebay

https://www.reddit.com/r/sre/comments/1j145fx/delegate_aggressively_when_leading_an_incident/

This one includes a list of responsibilities a lead incident responder has and another list of things they should delegate.

Incident lead isn’t an extra job that you do “on top of” engineering. It’s the main job.

r/devoopseng — Reddit r/sre

How to Scale Elasticsearch to Solve Your Scalability Issues

Scaling Elasticsearch requires balancing sharding, query performance, and memory tuning for optimal efficiency in high-traffic, real-time applications.

Vivek Kumar — DZone

SRE Weekly Issue #466

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, incident.io:

Subscribe

RSS

Mastodon

Search Issues