Articles
This interesting post-incident analysis is marked as “Google Customer Confidential – Not for publication or distribution”, but Google linked it directly from their public status page. I normally would not include a seemingly “leaked” incident report like this, but in this case I think the “confidential” label is erroneous.
I keep re-learning and re-forgetting about TCP_NODELAY
.
Rachel By the Bay
The distinction between the two is a lot more nuanced than it may seem. What are we really trying to say wit those words?
Michael Nygard
This incident from the week before last involved a Let’s Encrypt API rate limit.
Don’t you hate when you’re minding your own business upgrading your OS, and you run smack into a kernel bug in the ext4fs code?
…ext4 performance on kernel versions above 4.5 and below 5.6 suffers severely in the presence of concurrent sequential I/O on rotating disks.
Ryan Underwood — LinkedIn
Google discusses DDoS attacks and how they deal with them, including a 2.5Tbps attack in 2017.
Damian Menscher — Google
I love these first-hand incident stories. This one is from an engineer at Heroku who was a contributing factor in an incident last month.
Damien Mathieu — Heroku (Salesforce)
Outages
- BitBay
- Twitter
- It definitely was not taken down purposefully to protect a US presidential election candidate.
- TikTok
- Crunchyroll
- Barnes and Noble
- Nook e-readers have experienced a days’-long service disruption.
- keepthescore
- Linked is their blog post, “We deleted the production database by accident”.
Be sure to check out the HackerNews discussion about this article, too.
Caspar — Keepthescore
- Linked is their blog post, “We deleted the production database by accident”.
- FanDuel
- This incident seems to be ongoing, October 12 to present.