SRE Weekly Issue #141

Articles

Rethinking Netflix’s Edge Load Balancing

An outline of the design of Netflix’s new load balancer, with special emphasis on dealing with faltering backends. Great idea: servers report their utilization level in a response header. Tricky pitfall: the LB is so good at moving requests off of ailing backends that backend failure rate alerts don’t fire.

Mike Smith — Netflix

NewSQL database systems are failing to guarantee consistency, and I blame Spanner

This article begins by explaining consistency versus availability in distributed data stores and argues that the trade-off is less significant than one might think. Then it describes a pitfall seen in some new data stores. I’ve pondered aloud here in the past on how Spanner can make the claims it does, and this article explains that nicely.

Daniel Abadi

The redux of the fallacies of distributed computing

And here’s a refutation of part of the previous article by the creator of RavenDB.

Ayende Rahien

Getting The Airlines Back On Their Feet After A Disaster

It is tempting to think that ensuring the resilience or continuity of all the individual parts of a business will guarantee the resilience or continuity of the whole.

Dr. Sandra Bell

Upgrading GitHub from Rails 3.2 to 5.2

GitHub used an innovative technique to avoid holding open a long-running code branch while upgrading their application to support rails 5.2.

Eileen Uchitelle — GitHub

Travis CI: Build VMs boot failure on the sudo-enabled infrastructure: incident postmortem

Worker node errors led to cascading failure when they hit Google Compute Engine quotas.

Bogdana Vereha — Travis CI

Secret IBM script could have prevented 11-hour US tax day outage

This week, the US Internal Revenue Service (IRS) issued a report analyzing the tax-day outage that occurred this past April. Linked is a nice summary by the Register.

Thanks to reader Michael Fischer for a tip on the report.

Chris Mellor — The Register

Outages

Facebook
Amazon Alexa
Delta Airlines
Honeywell (smart thermostat manufacturer)
Zoho
- SaaS provider Zoho’s domain registration was revoked by its registrar after a run-of-the-mill phishing complaint, affecting 30 million users.
Steemit

SRE Weekly Issue #141

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

SPONSOR MESSAGE

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues