SRE Weekly Issue #120

Articles

Non-uniform memory access meets the OOM killer

“You can OOM a single NUMA node” thus entered my list of things to worry about when a box seems to have plenty of memory but still goes off and slaughters innocent (but big) processes.

Rachel Kroll

Conference, Interrupted – Food Fight

In this podcast episode, the panelists hold a retrospective for the snow-related delay of DevOps Days Baltimore. Toward the end they go into the idea of reliability and single points of failure with respect to conference planning. My favorite quote in the show, from Nell Shamrell-Harrington:

Incident Management is never about technology — it’s a people.

Nell Shamrell-Harrington and Nathen Harvey

‘I crashed AOL for 19 hours and messed up global email for a week’

I really love this Who, Me? section from The Register.

Simon Sharwood — The Register

Gremlin’s Tammy Bütow on the Business Side of Chaos Engineering

This article has a great discussion of how to get started with chaos engineering — and how to avoid biting off more than you can chew.

Jennifer Riggins — The New Stack

Stateless datacenter load-balancing with Beamer | the morning paper

Beamer is a stateless datacenter load balancer supporting both TCP and Multipath TCP (MPTCP). It manages to keep the load balancers stateless by taking advantage of connection state already held by servers.

Super-clever! The LB does keep state, but the size of the state is constant, unrelated to the number of connections flowing through it.

Adrian Colyer — summary, Olteanu et al. — original paper

Programming Sucks

Sometimes it’s worthwhile to lay everything out and describe just exactly what we’re up against as SREs. The analogies here are pretty awesome. Read this for a hefty dose of cynicism about the state of our increasingly computer-driven world.

Peter Welch

Outages

Google Groups
Indian Railway ticketing system
Twitter
- Site crashes just hours after telling millions to change their passwords due to an internal glitch, leaving users unable to sign on

SRE Weekly Issue #120

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

SPONSOR MESSAGE

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues