Articles
Segment discovered the hard way that their move to a microservice architecture had brought far more problems than benefits. Here’s why they transitioned back and how they pulled it off. Awesome article!
Alexandra Noonan — Segment
Drawing on the work of Dr. David Woods and the rest of the SNAFU Catchers, this article discusses the concepts behind resiliency and how to measure and achieve it.
Beth Long — New Relic
Serverless is not the magical gateway to the land of NoOps. You still have to operate your system even if you’re not directly running the servers. This article does a great job of explaining why.
Bhanu Singh — Network World
New to me: Wireshark’s statistics view and how it can be useful.
Julia Evans
How do you define whether your system is available and healthy? This article uses an anology to medical health.
Claiming that our system is doing well means nothing if users can perceive an outage.
José Carlos Chávez — Typeform
These folks are experiencing mysterious latency with HTTP/2 traffic to their ALB and published this report on their investigation. There’s no happy ending here — ultimately they disabled HTTP/2 support. I hope they update if they do discover the culprit.
Peter Forsberg — ShopGun
I had some fun this week unearthing the cause for the chronic lockups in Rsyslog that we’ve experienced at work. I found the cause (overeager retries of socket writes) and put together a bug report and a pull request.
Full disclosure: Fastly, my employer, is mentioned.
I love science! Grab wrote a nifty tool to help them select cohorts of users and perform experiments on them.
Abeesh Thomas and Roman Atachiants — Grab
Titus is the container orchestration system that Netflix created and open sourced. Rather than building a new auto-scaling feature for Titus, they instead got Amazon to productize EC2’s auto-scaling engine as a generalized auto-scaling tool, which Netflix then integrated with Titus. Neat!
See Amazon’s Application Auto Scaling announcement, published this past week.
Andrew Leung, Amit Joshi, and the rest of the Titus team — Netflix
Outages
- Gmail
- Google Docs, Sheets, et al.
- YouTube TV
- During the World Cup match.
- Discord
- Discord had a couple of outages this week.
- Mastercard
- Facebook Messenger
- Snapchat
- 99acres (real estate site)
- Heroku
- Disney blames 4-hour tech woes on network maintenance
- Here’s an update on the Disney system outage I linked to last week.
Gabrielle Russon — Orlando Sentinel
- Here’s an update on the Disney system outage I linked to last week.