The author discusses their approach in tracking down and solving a memory leak issue in a Node.js library that was causing OOM errors in a dozen services.
The author first tried to understand the code and then attempted to replicate the issue in isolation but was unable to spot the problem.
To capture profiling data, the author connected Chrome DevTools to staging services, capturing heap snapshots over time, and looked for the biggest positive delta, which was happening on the string constructor.
The author narrowed down the scope of the fix to understand when the code was pushing items to and popping them out of an array while leveraging the sentEvents object for producing and consuming events.
The result of the patch drastically reduced memory usage and improved the service's reliability. The patch also reduced the required number of pods handling the same traffic by 50%.
This article serves as a good starting point for anyone interested in learning more about tracking memory issues in Node.js.