SRE Weekly Issue #190

Articles

This company had a really challenging on-call situation to fix. Monolithic codebase, and a huge team with so many people in the on-call rotation that folks were out of practice by the time it was their turn.

Molly Struve

Incidents — Trends from the Trenches

This article includes charts, observations, and conclusions from the author’s by-hand analysis and categorization of several hundred incidents.

Subbu Allamaraju

@mipsytipsy on Twitter: don’t alert on everything

Charity Majors replied to a suggestion to write alerts for everything with her ideas for a better way.

Charity Majors (@mipsytipsy)

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Where many databases use threading to handle concurrent clients, PostgreSQL forks one child process per client. This has ramifications that an operator must take into consideration.

Kristi Anderson — High Scalability

Four Key Attributes of Advanced Anomaly Detection

This article is about attributes, but it doesn’t mention a specific system. I have yet to find an anomaly detection system that doesn’t produce so many false positives that it’s useless.

Hive mind: if you’re using an anomaly detection system that actually works and doesn’t drown you with false positives, I want to hear about it. Bonus points if you want to write an article about it!

Amit Levi

SRE Weekly Issue #190

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, VictorOps:

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues