SRE Weekly Issue #135

Articles

Using the internet without the Amazon Cloud

What might an AWS outage look like? Try this new simulation tool to find out!

It’s not something you’ll want to use for too long (the internet is better when it works, it turns out), but it’s a view that’s well worth taking in, if only to taste the sheer scope of Amazon’s server empire.

Russell Brandom — The Verge (tool by Dhruv Mehrotra)

How to set up high availability storage with GlusterFS on Ubuntu 18.04

This article goes step-by-step through setting up a 3-server GlusterFS cluster.

Jack Wallen — TechRepublic

PagerDutyAMA: Alice Goldfuss

My favorite part of this is the concept of vacations as a “human game day”. Can we survive without you?

Matt Stratton — PagerDuty (with Alice Goldfuss)

Application Safety and Correctness Cannot Be Offloaded to Istio or Any Service Mesh

One question I have been seeing is “if Istio provides reliability for me, do I have to worry about it in my application?”

The answer is: abso-freakin-lutely :)

Christian Posta

The Seattle Plane Crash: Lessons and Questions

This take on the theft and crashing of an airplane in Seattle is applicable to SRE in multiple ways. It includes discussion of the incident response and some thoughts on what level of risk for extremely rare events is acceptable.

James Fallows — The Atlantic

Twitter: @dbaops on what SRE is

Two funny GIFs about SRE. Full disclosure: @dbaops is my boss and this stemmed from a DM conversation between us.

@dbaops on Twitter

Health Checks and Graceful Degradation in Distributed Systems

Coarse-grained health checks might be sufficient for orchestration systems, but prove to be inadequate to ensure quality-of-service and prevent cascading failures in distributed systems.

Cindy Sridharan

SRE Weekly Issue #135

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues

SPONSOR MESSAGE

Articles

Outages

Subscribe

RSS

Mastodon

Search Issues