SRE Weekly Issue #240

A message from our sponsor, StackHawk:

Be sure to register for SnykCon to learn about the latest DevSecOps trends. And while you are there, check out the StackHawk booth for our Nintendo Switch giveaway.
http://bit.ly/SnykConStackHawk

Articles

This interesting post-incident analysis is marked as “Google Customer Confidential – Not for publication or distribution”, but Google linked it directly from their public status page. I normally would not include a seemingly “leaked” incident report like this, but in this case I think the “confidential” label is erroneous.

Google

I keep re-learning and re-forgetting about TCP_NODELAY.

Rachel By the Bay

The distinction between the two is a lot more nuanced than it may seem. What are we really trying to say wit those words?

Michael Nygard

This incident from the week before last involved a Let’s Encrypt API rate limit.

Don’t you hate when you’re minding your own business upgrading your OS, and you run smack into a kernel bug in the ext4fs code?

…ext4 performance on kernel versions above 4.5 and below 5.6 suffers severely in the presence of concurrent sequential I/O on rotating disks.

Ryan Underwood — LinkedIn

Google discusses DDoS attacks and how they deal with them, including a 2.5Tbps attack in 2017.

Damian Menscher — Google

I love these first-hand incident stories. This one is from an engineer at Heroku who was a contributing factor in an incident last month.

Damien Mathieu — Heroku (Salesforce)

Outages

Updated: October 18, 2020 — 8:46 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme