If you have a minute (it’ll only take one!), would you please fill out this survey? Gabe Abinante (featured here previously) is gathering information about the on-call experience with an eye toward presenting it at Monitorama.
Wow, what a resource! As the URL says, this is “some ops for devs info”. Tons of links to useful background for developers that are starting to learn how to do operations. Thanks to the author for the link to SRE Weekly!
AWS Lambda response time can increase sharply if your function is accessed infrequently. I love the graphs in this post.
A top-notch article on how to avoid common load-testing pitfalls. Great for SREs as well as developers!
A description of an investigation into poor performance in a service with a 100% < 5ms SLA.
Docker posted this article on how they designed InfraKit for high availability.
A blanket block of ICMP on your network device breaks some important features like ping, traceroute, MTU discovery, and the like. MTU discovery (Fragmentation Required) is especially important, and ignoring it can cause connections to appear to time out for no obvious reason.