Time to get down into the bits and bytes of how Honeycomb queries work with this look into a recent optimization in their data storage layer.
Hazel Edmands — Honeycomb
Full disclosure: Honeycomb is my employer.
Here’s how HelloFresh integrated SLOs into their internal platform’s new progressive rollout capability.
Victor Hugo Brito Fernandes — HelloFresh
I like to consider running an incident review to be its own action item. Other follow-ups emerging from it are a plus, but the point is to learn from incidents, and the review gives room for that to happen.
Fred Hebert
Note: Fred is my coworker and I’m mentioned in this article
This article covers a wealth of topics around creating an on-call system.
Learn how to navigate vacations, parenthood and personal preferences to improve your reliability practice.
Rootly
There has been major flooding in Brazil recently, and this article looks at it with an SRE lens. Note, the main article is in Portuguese with an English translation lower down the page.
Dario Bestetti
This article shows you how to use Infrastructure as Code to implement AWS’s Well-Architected Framework, with Terraform examples.
Lokesh Aggarwal
The challenges of Auto Scaling, from cold start impact, tech debt, and cost realities. Prioritising scaling as code and shared responsibility for optimal performance in cloud efficiency.
Karl Stoney
For each post-incident action that you are proposing, we would appreciate it if you would fill out the following template.
Looking at the author, you know this one’s not going to just be what it says on the tin. It’s a thought-provoking exploration of the meaning and purpose of post-incident action items.
Lorin Hochstein