omg is that a memory leak?
I've spent a lot of time testing Node.js/Sails.js apps and modules for memory leaks (with the help of @particlebanana, @sgress454, and others). Fortunately, true memory leaks are pretty rare.
But I have encountered memory leaks in some application-level code at least once. I say "at least" because the rest of the Sails.js team and I have found WAAAAAYY more things that we thought were memory leaks, but actually weren't.
Which is good, I guess. But possibly even more annoying.
So I wanted to jot down a bit about what I've learned about memory leaks over the past five years -- and specifically the process that I use to diagnose whether one exists in a Node.js/Sails.js app.
Do everything you can to simplify your test environment.
Before you get started trying to diagnose a memory leak, you should rule out as many problems as possible. In a Node.js/Sails.js app, that starts with configuring recommended production settings.
Accept the crushing reality that it's probably all your fault.
Next, check your app-level code (i.e. try to isolate the leak to a single endpoint). In my experience, it is very common w/ Node apps in general to end up with leaks from failing to handle errors (e.g. if you're using promises and forget to do a .catch()).
If you can't isolate the source of increasing memory usage to a particular endpoint or anything about your app code, then it's time to try replicating the leak in a brand new app running with the same recommended production settings, with no bells and whistles, with recommended production settings.
3. Remind yourself what a memory leak is, actually
Take a moment to mutter to yourself to make sure you remember what you're doing.
I have no idea if this is true for you or not; but when I'm working on a problem, I have to constantly remind myself of what I'm doing. Otherwise I forget. Or worse, I'll realize what I was doing didn't really make any sense anyway.
So let's take a second to do that.
We're trying to figure out whether our Node process has a memory leak.
A Node process will grow its memory usage until it decides it's time to run the garbage collector. Just because memory is going up continually does not mean there is a memory leak-- it means that the garbage collector has not run yet.
Sounds simple, right? But this tends to be where things break down. Until you've actually gone through the process of diagnosing a leak step by step, it's really hard to know wtf you're doing. And if you think you know wtf you're doing, you probably don't. (That was the case the first time I tried it anyway.)
So in the spirit of that, here are a few examples to give you some reps: (1 2 3 )
Set up monitoring, recreate the supposedly leaky behavior, then use the garbage collector to verify that memory is never being reclaimed.
The only surefire way to diagnose a memory leak is to run your Node process with the --expose-gc flag enabled, to manually run the garbage collector , and then to compare the "valleys" of the chart. Here's a step by step guide:
Expose a development-only endpoint that, when called, will run the garbage collector. For example: https://github.com/balderdashy/sails-hook-dev/blob/master/index.js#L190-L203 (you can also just install sails-hook-dev in your project).
Start up a program that monitors your Node process's memory. I recommend NodeSource.
Lift your app with all recommended production settings (see deployment and scaling docs on sailsjs.org). But also make sure you tell Node.js to allow your code programmatic use of the garbage collector. For example, I like to do this by running:
NODE_ENV=production node --expose-gc app.js
4D. Do stuff or run load tests (round 1)
Now perform the behavior that you suspect will cause a memory leak. For example, that might be sending "POST" requests to a particular URL (use your app's UI or a tool like Postman). After about 30 minutes of this, if you take a look at the graphs, they might look slopey and bad, or they might flat and good. But that doesn't matter at all-- we have no idea about whether there are memory leaks at this point.
4E. Force garbage collector to run (round 1)
Now hit your endpoint that runs the garbage collector. You'll see the graphs drop dramatically after a moment. Take note of the lowest point in the graph (the "valley"). This is the amount of memory that the process is using, and that could not be reclaimed using the garbage collector. This is the memory where stuff you're actually using lives-- e.g. the require() cache, and local variables that are still in use. Be sure to take a screenshot of the graphs and memory usage in GB at this point.
4F. Do stuff or run load tests (round 2)
Now do exactly the same thing we did in step 4D again, for another 30 minutes.
4G. Force garbage collector to run (round 2)
Now do exactly the same thing we did in step 4E again. After a moment, if you notice that the "valley" in the graph is significantly higher than it was in step 4E, there might be something going on (there seems to be some amount of natural variation in what the garbage collector can actually reclaim-- in some cases this second "valley" is actually lower). Be sure to take a second screenshot of the graphs and memory usage in GB at this point.
Finally, if, after running through the steps above, it seems likely that there is a memory leak (i.e. the second "valley" was significantly higher than the first), then repeat steps 4F and 4G one more time to be sure.
Check out the "valleys" in your memory usage graph (the spots right after the garbage collector ran). If they're getting higher and higher, you've got a memory leak.
If there is a memory leak, you can expect the second "valley" to be significantly higher than the first, and the third "valley" to be significantly higher than the second.
If there's a memory leak, fix it.
That's it. Whether there's a memory leak or not, you'll never get that hour back. But at least now you know.
Originally posted on GitHub.