SRE Weekly Issue #461

The importance of resilience engineering

Written in 2020 after an AWS outage, this article analyzes dependence on third-party services and the responsibility to understand their reliability.

Uwe Friedrichsen

How we invalidate cache for resource-heavy & long-running requests

When a cache expired, these folks found that their application stampeded the database with expensive queries, so they searched for a solution.

Punit Sethi

The danger of overreaction

When a high-severity incident happens, its associated risks becomes salient: the incident looms large in our mind, and the fact that it just happened leads us to believe that the risk of a similar incident is very high.

Lorin Hochstein

Managing Trace Volume at monday.com

These folks landed on a hybrid approach using two vendors, allowing them to avoid sending their entire trace volume to an expensive observability vendor.

Jakub Sokół — monday

Adaptive LIFO

Under heavy load, requests are handled in LIFO order to maximize the chance of successfully completing fresh requests.

LIFO = Last In, First Out

Teiva Harsanyi

Kafka vs NATS: A Comparison for Message Processing

More than just a simple feature comparison, this article also presents two use cases and analyzes which tool is best in each case.

Josson Paul Kalapparambath — DZone

Go All the Way: Why Golang is Your Swiss Army Knife for Modern Development

These folks explain why they use Go for everything: application code, infrastructure as code, tooling, and even as a wrapper around Helm charts for Kubernetes.

Akhilesh Krishnan — Oodle AI

SRE Weekly Issue #461

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, incident.io:

Subscribe

RSS

Mastodon

Search Issues