Articles
The best kind of engineer is one that understands not only their specialty, but at least something about the fields adjacent to theirs. The empathy this confers allows one to work incredibly effectively across the company. For SREs, this is even more important.
[…] many of us are finding that the most valuable skill sets sit at the intersection of two or more disciplines.
Charity Majors — Honeycomb
GitLab held a session about recognizing and preventing burnout at their recent employee summit. They share the best tips in this article, and true to their radically open culture, they also added what they learned to their employee handbook, which is publicly available.
Clement Ho — GitLab
Here’s a post-analysis for a Travis CI incident early last year. Despite a couple of easy targets that could have been labelled as “root cause”, they instead skillfully laid out all of the contributing factors and left it at that.
Travis CI
What indeed? The same thing, just organized differently. There’s a lot of great analysis here about how ops roles can adapt to a serverless infrastructure, and how teams can best make use of ops folks.
Tom McLaughlin — ServerlessOps
Charity Majors wants you to look forward to on-call. This superb write-up of her recent conference talk explains why folks should think of on-call as an enjoyable privilege and how to shape your on-call to get there.
Jennifer Riggins
The Canary Analysis Service is Google’s internal tool that automatically analyzes canary runs and decides whether performance has been negatively impacted. My favorite section is the Lessons Learned.
Štěpán Davidovič with Betsy Beyer — ACM Queue
Outages
- Snapchat
- 123 Reg (hosting provider)
- Customers lost files added since 123 Reg’s last valid backup from August, 2017.
- partypoker
- eBay
- Signal and Telegram (messenger apps)
- Alexa
- I missed this one last week — it was apparently due to the AWS outage I reported on.
- TD Bank
- Oculus Rift
- A code-signing certificate expired, rendering some existing VR headsets non-functional. Oculus is issuing a $15 store credit to affected customers.
Because of the particulars of what expired and how it happened, the company wasn’t able to simply push an update out to users because the expired certificate was blocking Oculus’ standard software update system.
- A code-signing certificate expired, rendering some existing VR headsets non-functional. Oculus is issuing a $15 store credit to affected customers.