Articles
I’m trying something new: I’m looking for input from you, dear readers!
This link is a Google Form where I’m asking for ideas that I might turn into a blog post or conference talk. If you’re game, I’d love to hear what you think.
Here’s the panel for this webinar:
- Vanessa Huerta Granda (Jeli)
- Emily Ruppe (Jeli)
- Liz Fong-Jones (Honeycomb)
- Fred Hebert (Honeycomb)
Honestly, with that set of names, I’d listen even if they were just discussing the weather.
Full disclosure: Honeycomb, my employer, is mentioned.
This week saw an outage of the NOTAM system which disseminates important information to aircraft pilots in the US. As a result, all flights in the US were grounded.
There’s not much in the way of interesting detail available yet, but I did see a mention of this air incident in which NOTAMs played a significant part. Mentour Pilot also covered this one
Admiral Cloudberg
In essence, this new reliability is:
- The health of your system
- Weighed based on customer expectations and happiness
- Prioritized based on your current capabilities
This article focuses on the sociotechnical aspects of reliability.
Jim Gochee — The New Stack
Here are some guidelines for what kind of alerting works best for services at various stages of maturity.
Ali Sattari
The actions we take to avert a potential problem can introduce their own risks.
Will Gallego
This one’s from the incident.io folks.
incident.io
I often meet with skepticism when I say that server monitoring systems should only page when a service stops doing its work.
Read on to find out why.
Dan Slimmon