We build our systems against the usage patterns of human users, but agents fundamentally change the game.
Vineet Bhatkoti — DZone
This is an interesting lens for exploring the risks that agents can introduce.
Sayali Patil — VentureBeat
Great discussion in the comments! There’s a lot of variance in how much time people recommend. I personally tend to lean earlier — on-call is a great way to learn, and I can always reach out if I get stuck.
u/modern_medicine_isnt and commenters — Reddit r/sre
A great into to the concept of metastable failures — and I recommend reading the original paper as well.
Teiva Harsanyi
The real issue is that your company has made declaring an incident costly and risky for the person who does it.
Brent Chapman
I enjoyed learning about their deliberate architectural choice to keep their central service in a single AZ. This incident highlighted a need for a fast failover plan.
Coinbase
I like the balance between ensuring 99.99% reliability and designing their product to encourage customers to use their platform in a way that effectively manages the 0.01% case.
Reliability is a customer experience problem
Mike Fisher — incident.io
I’m not gonna spoil this one for you by writing a summary. Just read it, trust me.
Lorin Hochstein
