The Node.js Way - How `require()` Actually Works

June 9, 2014 (Updated December 18, 2014)

Update July. 28, 2014: I just gave a talk at BayNode on this exact subject, which includes a walkthrough of all the code discussed in this post. If talks & slides are more your style, check it out.

Almost any Node.js developer can tell you what the require() function does, but how many of us actually know how it works? We use it every day to load libraries and modules, but its behavior otherwise is a mystery.

Curious, I dug into Node core to find out what was happening under the hood. But instead of finding a single function, I ended up at the heart of Node’s module system: module.js. The file contains a surprisingly powerful yet relatively unknown core module that controls the loading, compiling, and caching of every file used. require(), it turned out, was just the tip of the iceberg.

module.js

function Module(id, parent) {
  this.id = id;
  this.exports = {};
  this.parent = parent;
  // ...

The Module type found in module.js has two main roles inside of Node.js. First, it provides a foundation for all Node.js modules to build off of. Each file is given a new instance of this base module on load, which persists even after the file has run. This is why we are able attach properties to module.exports and return them later as needed.

The module’s second big job is to handle Node’s module loading mechanism. The stand-alone require function that we use is actually an abstraction over module.require, which is itself just a simple wrapper around Module._load. This load method handles the actual loading of each file, and is where we’ll begin our journey.

Module._load

Module._load = function(request, parent, isMain) {
  // 1. Check Module._cache for the cached module.
  // 2. Create a new Module instance if cache is empty.
  // 3. Save it to the cache.
  // 4. Call module.load() with your the given filename.
  //    This will call module.compile() after reading the file contents.
  // 5. If there was an error loading/parsing the file,
  //    delete the bad module from the cache
  // 6. return module.exports
};

Module._load is responsible for loading new modules and managing the module cache. Caching each module on load reduces the number of redundant file reads and can speed up your application significantly. In addition, sharing module instances allows for singleton-like modules that can keep state across a project.

If a module doesn’t exist in the cache, Module._load will create a new base module for that file. It will then tell the module to read in the new file’s contents before sending them to module._compile.[1]

If you notice step #6 above, you’ll see that module.exports is returned to the user. This is why you use exports and module.exports when defining your public interface, since that’s exactly what Module._load and then require will return. I was surprised that there wasn’t more magic going on here, but if anything that’s for the better.

module._compile

Module.prototype._compile = function(content, filename) {
  // 1. Create the standalone require function that calls module.require.
  // 2. Attach other helper methods to require.
  // 3. Wraps the JS code in a function that provides our require,
  //    module, etc. variables locally to the module scope.
  // 4. Run that function
};

This is where the real magic happens. First, a special standalone require function is created for that module. THIS is the require function that we are all familiar with. While the function itself is just a wrapper around Module.require, it also contains some lesser-known helper properties and methods for us to use:

require(): Loads an external module
require.resolve(): Resolves a module name to its absolute path
require.main: The main module
require.cache: All cached modules
require.extensions: Available compilation methods for each valid file type, based on its extension

Once require is ready, the entire loaded source code is wrapped in a new function, which takes in require, module, exports, and all other exposed variables as arguments. This creates a new functional scope just for that module so that there is no pollution of the rest of the Node.js environment.

(function (exports, require, module, __filename, __dirname) {
  // YOUR CODE INJECTED HERE!
});

Finally, the function wrapping the module is run. The entire Module._compile method is executed synchronously, so the original call to Module._load just waits for this code to run before finishing up and returning module.exports back to the user.

Conclusion

And so we’ve reached the end of the require code path, and in doing so have come full circle by creating the very require function that we had begun investigating in the first place.

If you’ve made it all this way, then you’re ready for the final secret: require('module'). That’s right, the module system itself can be loaded VIA the module system. INCEPTION. This may sound strange, but it lets userland modules interact with the loading system without digging into Node.js core. Popular modules like mockery and rewire are built off of this.^[2]

If you want to learn more, check out the module.js source code for yourself. There is plenty more there to keep you busy and blow your mind. Bonus points for the first person who can tell me what ‘NODE_MODULE_CONTEXTS’ is and why it was added.

[1] The module._compile method is only used for running JavaScript files. JSON files are simply parsed and returned via JSON.parse()

[2] However, both of these modules are built on private Module methods, like Module._resolveLookupPaths and Module._findPath. You could argue that this isn’t much better…