I’m dedicating this issue to the people of Ukraine, and also those in Russia that are protesting the invasion.
In this episode of the podcast Page it to the Limit, they discuss learning how to be an incident commander.
There was major AWS outage and the second day I was incident command.
Kat Gaines, with guest Iris Carrera — Page it to the Limit
This article discusses three aspects of fully owning your systems: mandate, knowledge, and responsibility. After defining those terms, it goes on to discuss what happens if one of the three is missing.
I really like the “Managing High RPS” section, especially the part about ignoring events if they’re too old to be relevant any longer.
Ankush Gulati and David Gevorkyan — Netflix
Cool idea! When a process is overloaded, the system drops requests based on heuristics until the overload condition has passed.
Bryan Barkley — LinkedIn
Here’s another take on incident severity and priority levels. The two terms are different and mean specific things.
Robert Ross — FireHydrant
Can we please agree to stop calling them “postmortems”?
Ash P — Cruform Newsletter
The term “service level” goes back to the US highway system maintenance procedures, among others.
Akshay Chugh and Piyush Verma — Last9
Charity Majors has railed against metrics for years. Now, her company Honeycomb has a metrics product offering. How does she square it?
Charity Majors — Honeycomb
Despite the December AWS outage, folks aren’t fleeing AWS, and multi-cloud designs for reliability still don’t make sense, according to this cloud consultant. The media angle is fascinating.
Lydia Leong — Cloud Pundit
This article has a great list of ideas of who to talk to, plus a section on how to prioritize when you’re short on time.
Daniela Hurtado — Jeli
They posted a followup with details on what happened.
A configuration change inadvertently lead to a sudden increase in activity on our database infrastructure.
- crates.io (Rust package repository)
- British Airways
- Truth Social
- Truth Social
Due to the overwhelming demand at launch, we are currently rate-limited on onboarding new users to the platform.