Wednesday, April 10, 2013

Exceptions and resource leaks in Javascript and Node.js

A while ago I read a comment in the Node.js documentation on uncaughtException mentioning that it may be removed (no longer true), that it is a crude mechanism for exception handling and shouldn’t be used and that domains should be used instead. So I made a mental note and figured I’d look into domains when I got the chance. This past week I started looking into it a bit and noticed some more concerning comments in the documentation for domains:

By the very nature of how throw works in JavaScript, there is almost never any way to safely “pick up where you left off”, without leaking references, or creating some other sort of undefined brittle state.

In the comment in the code block a little below that, where it’s demonstrating a bad way to use domains as a global catch-all like uncaughtException it states:

The error won’t crash the process, but what it does is worse! Though we’ve prevented abrupt process restarting, we are leaking resources like crazy if this ever happens.

They recommend that when you catch exceptions in a global way like this you always shut down and restart the process. My first reaction was something like “Woah! Hold on there Skippy!” I’ve written a server that handles a lot of persistent websocket connections and I can’t just shutdown because I didn’t handle some exception thrown while handling a misbehaving client. And what the heck is it talking about with this whole “leaking resources like crazy” thing anyway?

So naturally I did some Googling to see if I could figure out what it’s talking about. Unfortunately I couldn’t find anything useful. I posted a question on Stack Overflow which was helpful though. While the answers there do seem kind of obvious in retrospect they weren’t obvious to me at the time because the resources (open files and sockets) mentioned in the answers don’t apply to the server I’ve written (another reason why it’s bad to blindly tell everyone they’re leaking resources).

My purpose in writing this post is mostly to help answer the question I had about what resources might be leaking and how to write code taking this into account. The Node.js documentation is misleading and non-descriptive. The only time it’s definitely bad to continue after an exception is if you literally have no idea where it was thrown and what resources might have been in use (e.g. global exception handlers).

When an exception is thrown while you have an open stream (file, socket, etc…) the code that follows which normally should finish with the resource and close it, never runs. That means it’s left open and you are now leaking resources. When you throw an exception in a block of code where resources are open you are introducing leaks.

The only way to make sure you aren’t leaking resources is on a case-by-case basis… sort of.

The problem with exceptions can be handled pretty well with domains. However, you run into the exact same problem if you simply forget to close a resource or return too early, while it’s still open.

Some Solutions

Ideally you shouldn’t have unhandled exceptions. We don’t live in an ideal world, but it’s still good to try to handle as many as you can. If you catch the exception close enough to where it occurred that you know what resources are in use and can clean them up properly then it’s alright to continue. If you write code that throws exceptions, keep in mind what resources are open at the time you throw the exception and try to clean them up before throwing.

You can use domains in Node.js to try to have some context and specificity to your error handling. For example, if you were writing an HTTP server, you could use domains to handle the case where an exception is thrown while processing the request (even if it’s doing asynchronous stuff) and then at least shut down the open socket from the client and clean up the request a bit. Otherwise, since an exception was thrown, no response would be sent and the request would be left open until it timed out.

If you are writing a server or process that can fork or spawn workers (such as a request handler for a web server or something else) then the problem can be largely mitigated by simply shutting down the worker (hopefully gracefully) and spawning new workers as needed.

An Ideal Solution

Though it doesn’t exist in JavaScript, an ideal solution would be something like Python’s context managers. They are a powerful tool for managing resources. As mentioned here:

The advantage of using a with statement is that it is guaranteed to close the file no matter how the nested block exits. If an exception occurs before the end of the block, it will close the file before the exception is caught by an outer exception handler. If the nested block were to contain a return statement, or a continue or break statement, the with statement would automatically close the file in those cases, too.

Using the with statement in Python (for those unfamiliar) looks like this:

with open('/path/to/file', 'w') as f:
    f.write('some text')

That’s it! There is no need to bother closing the file. Anything that needs to be done with the file is done within the with block. When it exits the scope the context manager closes the file. And the best part is that you can write your own context manager, just like open() in this example! You could write one for handling sockets, including sockets that are already open.

Though there is nothing like this in JavaScript I think it’d definitely be a nice addition. Even if it were never to become a part of JavaScript itself, it would be a particularly nice addition to Node.js. Unlike JavaScript in the web browser, JavaScript on the server handles a lot more resources like this (open files and sockets) so it is much more important to be able to handle resources in an elegant and clean way without having to shutdown every time an unhandled exception occurs.

That said, I don’t know exactly how they would implement something like context managers with the asynchronicity of JavaScript. I suppose when there are no more references to a resource it’s essentially “out of scope” and the exit on a context manager could be called to close the resource. In that case it would have to run as part of the garbage collection process and you really cannot know when it will run which could produce other unpredictable and/or undesirable behavior.

Conclusion

Keep in mind that this post only really addresses concerns relating to leaking resources, not the brittle state of the application after an unhandled exception that is being caught for the first time globally, and thus without context. It is difficult then to be able to know the state of the application. Whatever function threw the exception (which you don’t know since you have no context) could have modified any shared state variables and not finished its work, leaving it in an unknown state. Global exception handlers really should only be used for logging uncaught exceptions so you can fix them and to shutdown gracefully. Though it’s not impossible, especially in smaller systems, to be able to verify that the system is in a sane state from a global exception handler.

For an interesting and brief explanation of things to consider with exception handling, check out this explanation from an answer to my SO question. Trust me, you want to click that link!