SRE Weekly Issue #340

Articles

SREcon Americas 2020: Exposing the Human Factor

This one’s from a couple years ago and covers 3 main themes the author saw at SRECon Americas 2020. Fascinating topics include providing context for newbies, learning from incidents, and rethinking the incident command system.

Taylor Barnett — Transposit

Honeycomb preliminary incident report: Ingestion delays

On September 8, Honeycomb had a major outage in data ingestion, and they’ve posted this preliminary report, “pending an in-depth incident review in the upcoming weeks”.

BONUS CONTENT: Another outage report from a different outage the next day.

Honeycomb
Full disclosure: Honeycomb is my employer.

/r/sre Thread: A “real” day in the life of an SRE

This is neat! Someone posted a day in their life as an actual SRE, and a bunch of commenters followed suit.

Various commenters — Reddit

What’s Difficult About Problem Detection? Three Key Takeaways

Some big names in SRE got together to talk about how to know when your system is broken. Listen to the recording or read this excellent summary that goes in depth on grey failures and more.

Emily Arnott — Blameless

Scaling Robinhood Crypto Systems

To better scale our systems, our infrastructure and product teams got together and decided to make these optimizations: reduce database loads, conduct load tests and size the demand and prioritize critical flows.

…and sharding.

Robinhood

How an incident transformed Razorpay — Building our Command Center

A major incident went poorly, and that catalyzed investment in developing a new incident response system. They worked to transition from swarming to Incident Command.

Vikrant Saini — Razorpay

Consider these 9 microservices best practices to help you ditch your monolith — Cortex

I love this part:

[…] if you have to deploy your microservices in a certain order, they’re not really microservices.

Cortex

Heroku Incident 2451 Follow-up

This one had an interesting interplay of contributing factors.

Heroku

SRE Weekly Issue #340

Articles

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, Rootly:

Articles

Subscribe

RSS

Mastodon

Search Issues