This company had a really challenging on-call situation to fix. Monolithic codebase, and a huge team with so many people in the on-call rotation that folks were out of practice by the time it was their turn.

This article includes charts, observations, and conclusions from the author’s by-hand analysis and categorization of several hundred incidents.

Where many databases use threading to handle concurrent clients, PostgreSQL forks one child process per client. This has ramifications that an operator must take into consideration.

This article is about attributes, but it doesn’t mention a specific system. I have yet to find an anomaly detection system that doesn’t produce so many false positives that it’s useless.

Hive mind: if you’re using an anomaly detection system that actually works and doesn’t drown you with false positives, I want to hear about it. Bonus points if you want to write an article about it!

