Articles
This company had a really challenging on-call situation to fix. Monolithic codebase, and a huge team with so many people in the on-call rotation that folks were out of practice by the time it was their turn.
Molly Struve
This article includes charts, observations, and conclusions from the author’s by-hand analysis and categorization of several hundred incidents.
Subbu Allamaraju
Charity Majors replied to a suggestion to write alerts for everything with her ideas for a better way.
Charity Majors (@mipsytipsy)
Where many databases use threading to handle concurrent clients, PostgreSQL forks one child process per client. This has ramifications that an operator must take into consideration.
Kristi Anderson — High Scalability
This article is about attributes, but it doesn’t mention a specific system. I have yet to find an anomaly detection system that doesn’t produce so many false positives that it’s useless.
Hive mind: if you’re using an anomaly detection system that actually works and doesn’t drown you with false positives, I want to hear about it. Bonus points if you want to write an article about it!
Amit Levi