SRE Weekly Issue #284

Like last week, I prepared this week’s issue in advance, so no Outages section.  Have a great week!

A message from our sponsor, StackHawk:

Trying to automate application and API security testing? See how StackHawk and Burp Suite Enterprise stack up:


Soundcloud is very clear on the fact that they are not at Google scale. It’s interesting to see how they apply SRE principles at their scale.

Björn “Beorn” Rabenstein — SoundCloud

Here’s why Target set up their ELK stack, and how they used it to troubleshoot a problem in ElasticSearch itself.

Dan Getzke — Target

A key point in this article is that calculating your error budget as just “100% – SLO” goes about things backward.

Adam Hammond — Squadcast

They periodically scale up their systems just to test and be sure they’ll be ready for big events like Black Friday / Cyber Monday.

Kathryn Tang — Shopify

In this post, we’ll focus on service ownership. Why is service ownership important? How should teams self-organize to achieve it? Where’s the best place to start?


This fun troubleshooting story hinges around the internal details of how PostgreSQL’s sequences work.

Pete Hamilton —

Updated: August 22, 2021 — 7:38 pm
A production of Tinker Tinker Tinker, LLC Frontier Theme