SRE Weekly Issue #261

A message from our sponsor, StackHawk:

Join Snyk and StackHawk on March 18 as they walk through how to use Software Composition Analysis (SCA) and Dynamic Application Security Testing (DAST) in CI/CD to ship more secure applications.
http://sthwk.com/snyk-stackhawk-webinar

Articles

I find it really refreshing that fighter pilots have a retrospective about every single mission, successful or not. There’s always something to learn.

Jessica Abelson — Transposit

Heroku applies the Incident Management System, designating an Incident Commander who keeps the incident on track and oversees communications, both external and internal.

Guillaume Winter — Heroku

This story is becoming common: Khan had a sudden influx of traffic when pandemic lockdowns began. Their strategy involved the use of the cloud and a CDN.

Marta Kosarchyn — Khan Academy

Full disclosure: Fastly, my employer, is mentioned.

Here’s a great summary of how Squarespace does SRE.

Franklin Angulo — Squarespace

Leaders at Deliveroo, DigitalOcean, Fastly, and Headspace share how their organizations think about reliability and resiliency and their advice to engineering orgs embarking on reliability journeys.

The leaders each answer a series of questions about how their organization handles reliability, giving an interesting compare-and-contrast overview.

Increment

Full disclosure: Fastly is my employer.

Using a disaster plan created after a devastating hurricane, Freshworks survived and thrived during the pandemic, delivering a major new product by its pre-pandemic deadline.

Ipsita Agarwal — Increment

This one explains what a canary deployment is, how it can help you, and how canary deployments differ from blue/green deployments.

LaunchDarkly

This article explains the meaning of a growth mindset and shows how it applies to SRE.

Emily Arnott — Blameless

Outages

  • Fastly
    • Full disclosure: Fastly is my employer.
  • OVH Cloud
  • All domains containing “t.co” in Russia
    • It appears that Russia tried to impair access to Twitter’s URL-shortening domain t.co, but their pattern-matching was overzealous and affected any domain that contained “t.co” (think reddit.com, microsoft.com, and many others).
  • Dyn
    • Dyn had a DNS outage. I noted impact to Heroku, but I didn’t see any other related outage postings.
  • Chef
  • GitHub
Updated: March 14, 2021 — 9:05 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme