Articles
This latest in the CRE Life Lessons series takes on dependencies and how they impact a service’s SLO in obvious and subtle ways.
Robert van Gent — Google
This company discovered that the benefits of microservices came with some significant downsides. Here’s how they turned to chaos testing to improve reliability.
Meredith Courtemanche — TechTaret
Keeping in mind that this is written by the CTO of Gremlin, it contains some good points about buying versus building your chaos engineering system. It would apply to other chaos engineering services too — if there were any.
Matt Fornaciari — Gremlin, Inc.
Even as an experienced Terraform user, I learned about some Terraform features I hadn’t been aware of.
Nic Jackson — Hashicorp
In issue #98, I linked to a recording of John Allspaw’s DOES17 talk. In case you didn’t have time to listen, here’s a transcript. If you didn’t have time to read the Stella Report, I highly recommend reading this as an intro to the major concepts therein.
John Allspaw
Outages
- Fastly
- Full disclosure: Fastly is my employer.
- Travis CI
- Python Package Index (PyPI)
- Honeycomb
- Wow, I just love Honeycomb’s post-incident analyses, and this one is no exception. Highly recommend.
Rule of thumb as a developer: it’s probably not the database, it’s probably your code.
Turns out that it was, in this case!
Andy Isaacson
- Wow, I just love Honeycomb’s post-incident analyses, and this one is no exception. Highly recommend.
- Hulu
- MasterCard