In this podcast episode, Courtney Eckhardt and the panel cover a lot of bases related to incident response, retrospectives, defensiveness, blamelessness, social justice, and tons more engrossing stuff. Well worth a listen.
Mandy Moore (summary); John K. Sawers, Sam Livingston-Gray, Jamey Hampton, and Coraline Ada Ehmke (panelists); Courtney Eckhardt (guest)
Do you wonder what effect partitioned versus unified consistency might have on latency? Do you want to know what those terms mean? Read on.
Cape is Dropbox’s real-time event processing system. The design bits in this article have a ton of interesting detail, and I also love the part where they go into their motivations behind not just using an existing queuing system.
Peng Kang — Dropbox
This is a great intro to the circuit breaker pattern if you’re unfamiliar with it, and it’s also got a lot of meaty content for folks experienced with them.
Corey Scott — Grab
Though it sounds counterintuitive, more dashboards often make people less informed and less aligned.
Having a few good dashboards is important, but if you have too many, it’ll get in the way of your ability to do dynamic analysis.
Benn Stancil — Mode
What activities count as SRE work, versus “just” Operations?
Site Reliability Engineering do Operations but are not an Operations Team.