SRE Weekly Issue #429

Time to get down into the bits and bytes of how Honeycomb queries work with this look into a recent optimization in their data storage layer.

Hazel Edmands — Honeycomb

Full disclosure: Honeycomb is my employer.

How HelloFresh Uses SLOs for Progressive Delivery

Here’s how HelloFresh integrated SLOs into their internal platform’s new progressive rollout capability.

Victor Hugo Brito Fernandes — HelloFresh

The Review Is the Action Item

I like to consider running an incident review to be its own action item. Other follow-ups emerging from it are a plus, but the point is to learn from incidents, and the review gives room for that to happen.

Fred Hebert
Note: Fred is my coworker and I’m mentioned in this article

Building On-Call Schedules for Humans

This article covers a wealth of topics around creating an on-call system.

Learn how to navigate vacations, parenthood and personal preferences to improve your reliability practice.

Rootly

SRE and the flood tragedy in Southern Brazil: An Analogy for Resilience

There has been major flooding in Brazil recently, and this article looks at it with an SRE lens. Note, the main article is in Portuguese with an English translation lower down the page.

Dario Bestetti

Building IT Systems with Well-Architected Framework & Infrastructure as Code (With practical AWS…

This article shows you how to use Infrastructure as Code to implement AWS’s Well-Architected Framework, with Terraform examples.

Lokesh Aggarwal

To Auto Scale or not to Auto Scale, that is the question

The challenges of Auto Scaling, from cold start impact, tech debt, and cost realities. Prioritising scaling as code and shared responsibility for optimal performance in cloud efficiency.

Karl Stoney

Action item template

For each post-incident action that you are proposing, we would appreciate it if you would fill out the following template.

Looking at the author, you know this one’s not going to just be what it says on the tin. It’s a thought-provoking exploration of the meaning and purpose of post-incident action items.

Lorin Hochstein

SRE Weekly Issue #429

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, FireHydrant:

Subscribe

RSS

Mastodon

Search Issues