I don’t know how I missed this at the time! This is a fabulous article about finding and fixing a NUMA-related performance issue with ScyllaDB that starts high level, and when you think it’s gone as deep as it can go, it goes deeper. I learned a ton reading this!
This is a great read. As you said it is interesting to follow along as they dug deeper and deeper into the problem, eliminating possible problems along the way. But, you buried the lead here, the fix was about 10 characters long. This is movie level hacking, although the movie would have dramatic music, ten seconds of thinking and a couple keystrokes, leaving out the hours of debugging and tracing that was needed to get to the solution. Not to mention all the testing that would be needed afterwards.
What are people’s best stories for bug fixing that took hours, but GitHub stats would say it should have taken seconds to fix?
highlighting this as a great “Weekend Read”