SRE Weekly Issue #213

Articles

COVID-19: Why We Should All Wear Masks — There Is New Scientific Rationale

This is important, and well worth a read. Where’s the SRE connection? The article explains that the U.S. Surgeon General’s comment that masks are “not effective” led to a stigma against those that wear them here. That kind of unintended sociological effect is uncovered commonly in incident post-analysis.

Sui Huang

Keeping the Internet “Always On” — the Pressure of COVID-19 on Incident Response Teams

Pagerduty ran the numbers and discovered an increase in incidents recently, especially in certain companies.

Rachel Obstler — PagerDuty

February service disruptions post-incident analysis

Here’s the scoop on all those GitHub incidents in February.

Keith Ballinger — GitHub

Embrace Resilience for Business Continuity in Times of Uncertainty

No, it won’t be possible to continue operating business-as-usual. For the unforeseeable future, teams across the world will be dealing with cutbacks, infrastructure instability, and more. However, with SRE best practices, your team can embrace resilience and adapt through this difficult time.

Hannah Culver — Blameless

Remote incident management

5 tips for incident management when you’re suddenly remote

I love the concept of “ephemeral information”, that is, discussions that happen out-of-band, making it much harder to analyze the incident after the fact.

Blake Thorne — Atlassian

Elastic Cloud January 18, 2019 Incident Report

Grey failure turned a seemingly reasonable auto-recovery mechanism into a DoS caused by a thundering herd.

Panagiotis Moustafellos, Uri Cohen, and Sylvain Wallez — Elastic

Outages

G Suite
Google Cloud Platform
- GCP had a major incident that caused the G Suite outage.GCP also had an (apparently) unrelated outage later in the day.
BitBay (cryptocurrency exchange)
Netflix
Uber
WhatsApp
Fastly
- Also this one.Full disclosure: Fastly is my employer.
Reddit
Discord
Brightcove
Zoom
DoorDash
Nest
Canvas (remote learning tool)

SRE Weekly Issue #213

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, VictorOps:

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues