We built Edgar to ease this burden, by empowering our users to troubleshoot distributed systems efficiently with the help of a summarized presentation of request tracing, logs, analysis, and metadata.
Kevin Lew, Maulik Pandey, Narayanan Arunachalam, Dustin Haffner, Andrei Ushakov, Seth Katz, Greg Burrell, Ram Vaithilingam, Mike Smith and Elizabeth Carretto — Netflix
The PDF covers 5 main areas:
- Incident Response
No account required or form to fill out to download the PDF.
This one’s especially interesting for the section about what MTTx metrics aren’t good for, and the following section on how to improve them.
Emily Arnott — Blameless
If you’re interested in deploying Kafka in a multi-region configuration, eBay has put quite a bit of thought into this and has a lot to share.
Engin Yoeyen — eBay
Straight from someone who was there from the start. The “what chaos engineering is not” section is especially enlightening.
Casey Rosenthal — Verica
The last paragraph regarding “unknown unknowns” is noteworthy.
There are some great questions in here on blamelessness and full service ownership.
James Thigpen — Gremlin