Articles
This is an excellent summary of a talk on testing in production last month.
“Distributed systems are incredibly hostile to being cloned or imitated, or monitored or staged,” she said. “Trying to mirror your staging environment to production is a fool’s errand. Just give up.”
Joab Jackson — The New Stack
The Pros and Cons of Calvin and Spanner, two data-store papers published in 2012. According to the author, Calvin largely stands out as the favorite.
Daniel Abadi
What a cool concept!
RobinHood brings SLO violations down to 0.3%, compared to 30% SLO violations under the next best policy.
Adrian Colyer — The Morning Paper (summary)
Berger et al. (original paper)
With thousands(!) of MySQL shards, Dropbox needed a way to have transactions span multiple shards while maintaining consistency.
Daniel Tahara — Dropbox
This is an excellent introduction to heatmaps with some hints on how to interpret a couple common patterns.
Danyel Fisher — Honeycomb
This is a neat idea. By modelling the relationships between the components in your infrastructure, you can figure out which one might be to blame when everything starts alerting at once. Note: this article is heavily geared toward Instana.
Steve Waterworth — Instana
Automated bug fixing seems to be all the rage lately. I wonder, is it practical for companies that aren’t the size of Facebook or Google?
Johannes Bader, Satish Chandra, Eric Lippert, and Andrew Scott — Facebook
Outages
- Slack in Europe
- Netflix
- Microsoft’s Windows license activation service
-
Microsoft has acknowledged a problem affecting its Windows license activation servers in multiple countries that has resulted in users being told their Windows 10 Pro and Enterprise installations are invalid.
-
- Lloyds Bank
- GPS ankle bracelets in Australia
-
A violent parolee is on the run after part of the GPS tracking system broke down due to Telstra’s network issues on Friday and Saturday.
-