SRE Weekly Issue #289

View on sreweekly.com

Articles

How SREs are unique in their approach to work

Here are some things that make SREs a unique breed in software work:

The one about Scrum caught my eye, and I followed the links through to the Stack Overflow post about SRE and Scrum.

Ash P — Cruform

Linux Page Cache for SRE

An in-depth explainer on the Linux page cache, full of details and experiments.

Viacheslav Biriukov

Just got my first SRE job. I start tomorrow, any advice?

There’s some great advice in this reddit thread… and maybe some tongue-in-cheek advice too.

Take production down the first day they give access — then it’s nothing but up from there!

Various — reddit

Dark Side of Self-Service

Using two real-world case studies, this article explains how developer self-service can go wrong, and then discusses how to avoid these pitfalls.

Kaspar von Grünberg — humanitec

What is expected in the SRE role? We analyzed 30 job postings to find out.

What a great idea! I found it especially interesting that only 34% of SRE job postings mention defining SLIs/SLOs/error budgets.

Pruthvi — Spike.sh

10 questions teams should be asking for faster incident response

For the first time, we’ve created the State of Digital Operations Report which is based on PagerDuty platform data.
[…]
we will walk through some of these findings and share 10 questions teams can ask themselves to improve their incident response.

Hannah Culver — PagerDuty

How to avoid bad assumptions during incidents

Incident response so often gets mired in assumptions that need to be re-evaluated. This article uses an incident as a case study

Lawrence Jones — incident.io.

SRE vs. DevOps: What are the Differences?

This one lays out clear definitions of SRE and DevOps and compares and contrasts them.

Mateus Gurgel — Rootly

Merlion: A Machine Learning Library for Time Series

This week, Saleforce released Merlion, a Python library for time series machine learning and anomaly detection. Linked is an in-depth research paper on Merlin, explaining its theory of operation and experimental results.

Bhatnagar et al. — Salesforce

Outages

reddit
Atlassian Statuspage.io
Giant Pay
Trello
- Trello had major outages on two consecutive days (here‘s the other).

SRE Weekly Issue #289

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, StackHawk:

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues