SRE Weekly Issue #47

Articles

SRECon17 Americas Call for Participation

Next year, SRECon is expanding to three events: Americas, EMEA, and Asia. The Americas event is also moving from Santa Clara to San Francisco, which I, for one, am especially grateful for. The CFP for SRECon17 Americas just opened up, and proposals are due November 30th, so get cracking! I can’t wait to see what all of you have to share!

Introducing anomaly detection in Datadog

I have a somewhat dim view of automated anomaly detection in metrics based on my experience with a few tools, but if Datadog’s algorithms live up to their description, they might really have something worthwhile.

When a responder gets an anomaly alert, he or she needs to know exactly why the alert triggered. The monitor status page for anomaly alerts shows what the metric in question looked like over the alert’s evaluation window, overlaid with the algorithm’s predicted range for that metric.

From Zero to Staging and Back

This issue of Production Ready chronicles Mathias Lafeldt’s effort to create a staging environment. I like the emphasis on using an entirely separate AWS account for staging. This is increasingly becoming a best practice.

Honeycomb :: Nylas Guest Post: Ghosts in the WSGI Machine

What’s causing all that API request latency? Here’s an interesting debug run using Honeycomb. Negative HTTP status codes? Sure, that’s totally a thing, right?

The Irreproducibility of Bugs in Large-Scale Production Systems

I love this idea: Susan Fowler notes that large, complex systems are constantly changing, and this makes reproducing bugs difficult or impossible. Her suggestion is to log enough that you can logically reconstruct the state of the system at the time the bug occurred. This is the same kind of thing the Honeycomb folks are saying: throw a lot of information into your logs, just in case you might need it to debug something unforeseen.

Outages

Instagram
Level 3
- Another big Level 3 outage.
Battlefield 1 (game)
AT&T
- Three separate outages.
FIFA 17 (game)
Unprecedented cyber attack takes Liberia’s entire internet down
- The attackers used the Mirai botnet, the same one used to attack Dyn.

SRE Weekly Issue #47

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

SPONSOR MESSAGE

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues