SRE Weekly Issue #104

View on sreweekly.com

Well, that was a fun week. I hope all of you have had a chance for a rest after any hectic patching you might have been involved in.

Articles

Safety Moment – The Power of Local Rational…it is big!

Local Rationale: the reasoning and context behind a decision that an operator made. Here’s Todd Conklin reminding us to find out what was really going on when the benefit of hindsight makes a decision seem irrational.

Building a Distributed Log from Scratch, Part 2: Data Replication

In part two of the series I linked to last week, Tyler Treat introduces data replication strategies including replicating data to all replicas before returning or just a quorum.

Developing a Hospital Emergency Incident Command System (HEICS)

Here’s something I wasn’t aware of: hospitals have their own version of the ICS.

Google Cloud Platform Blog: Consequences of SLO violations

In this blogpost, we discuss why you should create a policy on how SREs and devs respond to SLO violations, and provide some ideas for the structure and components of that policy.

This Is What it Takes to Measure the Internet

Now this is neat. This research team pings basically the entire internet all the time and can track outages across the globe. They can see things like Egypt shutting down Internet access for all of its citizens and the effects of hurricanes.

How Log Analysis Can Bring Front-End Engineers on Call

This is a summary of a couple of talks from Influx Days. I especially like the bit about Baron Schwartz’s talk on the pitfalls of anomaly detection.

Speculative Execution Exploit Performance Impacts – Describing the performance impacts to security patches for CVE-2017-5754 CVE-2017-5753 and CVE-2017-5715

Meltdown is especially scary because the fix has the potential to significantly impact performance.

Outages

WhatsApp
- WhatsApp had trouble at the stroke of midnight on New Year’s Day.
US Customs
Yahoo Mail
New York (US state) tax department and DMV
Funimation and Crunchyroll
- Two anime sites were down, preventing fans from viewing a big new release.

SRE Weekly Issue #104

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

SPONSOR MESSAGE

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues