Articles
I find it really refreshing that fighter pilots have a retrospective about every single mission, successful or not. There’s always something to learn.
Jessica Abelson — Transposit
Heroku applies the Incident Management System, designating an Incident Commander who keeps the incident on track and oversees communications, both external and internal.
Guillaume Winter — Heroku
This story is becoming common: Khan had a sudden influx of traffic when pandemic lockdowns began. Their strategy involved the use of the cloud and a CDN.
Marta Kosarchyn — Khan Academy
Full disclosure: Fastly, my employer, is mentioned.
Here’s a great summary of how Squarespace does SRE.
Franklin Angulo — Squarespace
Leaders at Deliveroo, DigitalOcean, Fastly, and Headspace share how their organizations think about reliability and resiliency and their advice to engineering orgs embarking on reliability journeys.
The leaders each answer a series of questions about how their organization handles reliability, giving an interesting compare-and-contrast overview.
Increment
Full disclosure: Fastly is my employer.
Using a disaster plan created after a devastating hurricane, Freshworks survived and thrived during the pandemic, delivering a major new product by its pre-pandemic deadline.
Ipsita Agarwal — Increment
This one explains what a canary deployment is, how it can help you, and how canary deployments differ from blue/green deployments.
LaunchDarkly
This article explains the meaning of a growth mindset and shows how it applies to SRE.
Emily Arnott — Blameless
Outages
- Fastly
- Full disclosure: Fastly is my employer.
- OVH Cloud
- This week, there was a major fire at an OVH Cloud datacenter. As a result, Rust (an MMOG) permanently lost data, according to its creators.
- All domains containing “t.co” in Russia
- It appears that Russia tried to impair access to Twitter’s URL-shortening domain
t.co
, but their pattern-matching was overzealous and affected any domain that contained “t.co” (think reddit.com, microsoft.com, and many others).
- It appears that Russia tried to impair access to Twitter’s URL-shortening domain
- Dyn
- Dyn had a DNS outage. I noted impact to Heroku, but I didn’t see any other related outage postings.
- Chef
- GitHub