SRE Weekly Issue #243

A message from our sponsor, StackHawk:

The shift to rapid, frequent deployments over the past decade initially left application security behind. Modern AppSec belongs in the CI/CD pipeline.
http://sthwk.com/app-sec-pipeline

Articles

Sometimes I come across a simple but mind-blowingly awesome new idea. This is one of those times.

During periods of high load and errors, Netflix’s edge load balancer sends feedback to the apps running on users’ devices, adjusting their retry and backoff strategy to keep the service running as smoothly as possible but avoid a thundering herd. Brilliant.

Manuel Correa, Arthur Gonigberg, and Daniel West — Netflix

I helped to invent new approaches to correlate telemetry signals (exemplars, correlation between tracing and logging, profiler labels) that helped our engineers to navigate latency problems faster.

Facebook has two very different users for live streaming: “normal” users and broadcasters streaming sporting events and the like.

Hemal Khatri, Alex Lambert, Jordi Cenzano and Rodrigo Broilo — Facebook

This article covers the outcomes of research performed in 2019 on how engineers at Google debug production issues, including the types of tools, high-level strategies, and low-level tasks that engineers use in varying combinations to debug effectively.

Charisma Chan and Beth Cooper — Google

The three patterns discussed in this paper are:

  • decompensation
  • working at cross purposes
  • getting stuck in outdated behaviors

David Woods and Matthieu Branlat

Outages

Updated: November 8, 2020 — 8:21 pm
SRE WEEKLY © 2015 Frontier Theme