SRE Weekly Issue #429

A message from our sponsor, FireHydrant:

We’ve gone all out on our new integration with Microsoft Teams. If you’re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat.

https://firehydrant.com/blog/introducing-a-brand-new-microsoft-teams-integration/

Time to get down into the bits and bytes of how Honeycomb queries work with this look into a recent optimization in their data storage layer.

  Hazel Edmands — Honeycomb

  Full disclosure: Honeycomb is my employer.

Here’s how HelloFresh integrated SLOs into their internal platform’s new progressive rollout capability.

  Victor Hugo Brito Fernandes — HelloFresh

I like to consider running an incident review to be its own action item. Other follow-ups emerging from it are a plus, but the point is to learn from incidents, and the review gives room for that to happen.

  Fred Hebert
Note: Fred is my coworker and I’m mentioned in this article

This article covers a wealth of topics around creating an on-call system.

Learn how to navigate vacations, parenthood and personal preferences to improve your reliability practice.

  Rootly

There has been major flooding in Brazil recently, and this article looks at it with an SRE lens. Note, the main article is in Portuguese with an English translation lower down the page.

  Dario Bestetti

This article shows you how to use Infrastructure as Code to implement AWS’s Well-Architected Framework, with Terraform examples.

  Lokesh Aggarwal

The challenges of Auto Scaling, from cold start impact, tech debt, and cost realities. Prioritising scaling as code and shared responsibility for optimal performance in cloud efficiency.

  Karl Stoney

For each post-incident action that you are proposing, we would appreciate it if you would fill out the following template.

Looking at the author, you know this one’s not going to just be what it says on the tin. It’s a thought-provoking exploration of the meaning and purpose of post-incident action items.

  Lorin Hochstein

Updated: June 16, 2024 — 8:57 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme