SRE Weekly Issue #209

A message from our sponsor, VictorOps:

Efficient management of SQL schema evolutions allows DevOps professionals to deploy code quickly and reliably with little to no impact. Learn how modern teams are building out zero impact SQL database deployment workflows here:

https://go.victorops.com/sreweekly-zero-impact-sql-database-deployments

Articles

Azure developed this tool to sniff out production problems caused by deploys and guess which deploy might have been the culprit. Its accuracy is impressive.

Adrian Colyer — The Morning Paper (summary)

Li et al. — NSDI’20 (original paper)

This one made me laugh out loud.  Better check those system call return codes, people.

rachelbythebay

This caught my eye:

In addition, what is seen as the IC maintaining organizational discipline during a response can actually be undermining the sources of resilient practice that help incident responders cope with poorly matched coordination strategies and the cognitive demands of the incident.

Laura M.D. Maguire — ACM Queue Volume 17, Issue 6

A guide on salary expectations for various levels of SRE, especially useful if you’re changing jobs.

Gremlin

The flipside of microservices agility is the resiliency you can lose from service distribution. Here are some microservices resiliency patterns that can keep your services available and reliable.

Joydip Kanjilal

There have been several recent failures of consumer devices based on a cloud service outage, and this author argues for change.

Kevin C. Tofel — Stacey on IoT

This sounds familiar

Durham Radio News

Essentially, you’re taking that risk of the Friday afternoon deployment, and spreading it thinly across many deployments throughout the week.

Ben New

Outages

Updated: March 2, 2020 — 12:02 am
A production of Tinker Tinker Tinker, LLC Frontier Theme