SRE Weekly Issue #265

A message from our sponsor, StackHawk:

Join StackHawk and WhiteSource tomorrow morning to learn about automated security testing in the DevOps pipeline. With automated dynamic testing and software composition analysis, you can be sure you’re shipping secure APIs and applications. Grab your spot:
http://sthwk.com/stackhawk-whitesource

Articles

Here’s a great look into how LinkedIn’s embedded SREs work.

[…] the mission for Product SRE is to “engineer and drive product reliability by influencing architecture, providing tools, and enhancing observability.”

Zaina Afoulki and Lakshmi Namboori — LinkedIn

It’s all just other people’s caches.

Ruurtjan Pul

Recently there was a Reddit post asking for advice about moving from Site Reliability Engineering to Backend Eng. I started writing a response to it, the response got long, and so I turned it into a blog post.

Charles Cary — Shoreline

This is the first in a series about lessons SREs can learn from the space shuttle program. The author likens earlier spacecraft to microservices and the Shuttle to a monolith.

Robert Barron

This article is ostensibly about Emergency Medical Services (EMS), but as is so often the case, it’s directly applicable to SRE. The 5 characteristics are enlightening, and so is the fictitious anecdote about an EMT rattled from a previous incident.

Ems1

Simple solution meets reality. I like how we get to see what they did when things didn’t quite work out as they were hoping.

Robert Mosolgo — GitHub

They did the work to convert a database column to a 64-bit integer before it was too late. Unfortunately, one of their library dependencies didn’t use 64-bit integers.

Keith Ballinger — GitHub

In this post, I’ll walk you through one of our first ever Sidekiq incidents and how we improved our Sidekiq implementation as a result of this incident.

Nakul Pathak — Scribd

Outages

Updated: April 11, 2021 — 10:42 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme