This is a tough read, but really enlightening.

Thanks to Courtney Eckhardt for this one.

William Langewiesche — The Atlantic

Read this to find out why it’s so hard to nail down SLOs for cloud services.

Adrian Colyer — The Morning Paper (summary)

Mogul & Wilkes (original paper)

BGP: the horrifying, ugly monster lurking at the base of the Internet.

Stilgherrian — ZDNet

A different kind of monster.

Will Oremus — Slate

When you’re thinking about best practices for improving phases of the production incident cycle, there are three SRE principles that particularly matter for this task. Keep these in mind as you think about reliability.

Myk Taylor — Google

It’s important that we remember that there’s more to incident response than the technical aspect.

George Miranda — PagerDuty

Learn from this Second Officer’s account of a maritime near-miss and the five lessons they learned. My favorite:

As professionals, we always have more than one goal.

Nippin Anand — Safety Differently


