Using the Swedish word “Lagom” as a jumping-off point, this article explains the importance of choosing an SLO that is just right: not too lax and not too strict.
A simple security change like ceasing to use IMDSv1 can involve profound risk and necessitate a major migration process.
Archie Gunasekara — Slack
It can be all too easy to let a subset of your IT organization “handle” resiliency. If resilience is about an ability to adapt and respond to change, then it needs broad buy-in.
Richard Gall — The New Stack
If any seemingly innocuous change can break our systems, what should we do?
What exactly is “human error”?
Steven Shorrock — Humanistic Systems
We recently upgraded from Postgres 11.9 to 15.3 with zero downtime by using logical replication, a suite of support scripts, and tools in Elixir & Erlang’s BEAM virtual machine.
They share a ton of details about how they did it.
Brent Anderson — Knock
Why do doctors still use antiquated pagers? There’s a lot here that speaks to what it’s really like to operate in an on-call environment, and how to evaluate new tools.
This article riffs on Murphy’s law, exploring various aspects of how things go wrong using anecdotes.