SRE Weekly Issue #393

GitHub – teivah/sre-roadmap: An Opinionated Roadmap to Become an SRE (Concepts > Tools)

This repo contains a path to learn SRE, in the form of a list of concepts to familiarize oneself with.

Teiva Harsanyi

How can we justify the (sometimes significant) expense of instilling observability into our systems?

Nočnica Mellifera — SigNoz

1.1.1.1 lookup failures on October 4th, 2023

It was DNS. Cloudflare’s 1.1.1.1 recursive DNS service failed this week, stemming from failure to parse the new ZONEMD record type.

Ólafur Guðmundsson — Cloudflare

CAP Theorem: Use It to Choose an Open Source Database

Rather than just dry theory, this article helps you understand what the CAP theory means in practice as you choose a data store.

Note: this link was 504ing at time of publishing, so here’s the archive.org copy.

Bala Kalavala — Open Source For U

Whose fault was it anyway? On blameless post-mortems

A “blameless” culture can get in the way if it means you’re not allowed to make any mention of who was at the pointy-end of your system when things blew up.

incident.io

Building Resilience in the Face of Disruption: LinkedIn’s Journey to ISO 22301 Certification

In this post, we will share how we formalized the LinkedIn Business Continuity & Resilience Program, how this new program helped increase our customers’ confidence in our operations, and the lessons that we learned as we attained ISO 22301 certification.

Chau Vu — LinkedIn

Sre Interview Prep Plan | week 1

This is the start of a 6-article series, with each going through one week along a path to prepare for SRE interviews.

We’ll spend each week focusing on building up your expertise in the key areas SREs need to know, like automation, monitoring, incident response, etc.

Code Reliant

PACELC Theorem Explained: Distributed Systems Series

Beyond the CAP theorem, what actually happens during a partition?

“ if there is a partition (P), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally in the absence of partitions, how does the system trade off latency and consistency (L and C)” [Daniel J. Abadi]

Lohith Chittineni

SRE Weekly Issue #393

Subscribe

RSS

Mastodon

Search Issues

A message from our sponsor, Rootly:

Subscribe

RSS

Mastodon

Search Issues