Here’s a hands-on evaluation of the SLO offerings of three big players in the space. The author includes screenshots of their tests and shares their opinions on each.
Alex Ewerlöf
🔥🔥🔥 Can calling yourself an SRE be a liability?
rachelbythebay
This article outlines some options for combining multiple SLIs together. I like the emphasis on ensuring that the result provides a useful overview without sacrificing too much.
Ali Sattari
Lorin Hochstein proposes a rubric for judging whether a company truly is “safety first” in terms of preventing outages.
Lorin Hochstein
In this blog, we’ll present four strategies for successfully managing reliability while adopting Kubernetes.
Andre Newman — Gremlin
I haven’t seen a migration like this before. They managed a slow transition from an old system to a new one, keeping data in sync even though the two applications had entirely different database systems.
Claudio Guidi and Giovanni Cuccu — DZone
[…] what if instead of spending 20 years developing various approaches to dealing with asynchronous IO (e.g. async/await), we had instead spent that time making OS threads more efficient, such that one wouldn’t need asynchronous IO in the first place?
Yorick Peterse
I love a multi-level complex failure.
[…] during this disruption, a secondary issue caused automated failover to not work, rendering the entire metadata storage unavailable despite two other healthy zones being available.