hello.

Recently I’ve noticed a lack of resources on advanced Node.js topics. There are plenty of guides and tutorials for getting started, but very little is written on maintainable design or scalable architecture. This post is a part of The Node.js Handbook, a series created to address this gap by sharing tribal knowledge and best practices. You can read more here.

Almost any Node.js developer can tell you what the require() function does, but how many of us actually know how it works? We use it every day to load libraries and modules, but its behavior otherwise is a mystery.

Curious, I dug into Node core to find out what was happening under the hood. But instead of finding a single function, I ended up at the heart of Node’s module system: module.js. The file contains a surprisingly powerful yet relatively unknown core module that controls the loading, compiling, and caching of every file used. require(), it turned out, was just the tip of the iceberg.

module.js

1
2
3
4
5
function Module(id, parent) {
  this.id = id;
  this.exports = {};
  this.parent = parent;
  // ...

The Module type found in module.js has two main roles inside of Node.js. First, it provides a foundation for all Node.js modules to build off of. Each file is given a new instance of this base module on load, which persists even after the file has run. This is why we are able attach properties to module.exports and return them later as needed.

The module’s second big job is to handle Node’s module loading mechanism. The stand-alone require function that we use is actually an abstraction over module.require, which is itself just a simple wrapper around Module._load. This load method handles the actual loading of each file, and is where we’ll begin our journey.

Module._load

1
2
3
4
5
6
7
8
9
10
Module._load = function(request, parent, isMain) {
  // 1. Check Module._cache for the cached module. 
  // 2. Create a new Module instance if cache is empty.
  // 3. Save it to the cache.
  // 4. Call module.load() with your the given filename.
  //    This will call module.compile() after reading the file contents.
  // 5. If there was an error loading/parsing the file, 
  //    delete the bad module from the cache
  // 6. return module.exports
};

Module._load is responsible for loading new modules and managing the module cache. Caching each module on load reduces the number of redundant file reads and can speed up your application significantly. In addition, sharing module instances allows for singleton-like modules that can keep state across a project.

If a module doesn’t exist in the cache, Module._load will create a new base module for that file. It will then tell the module to read in the new file’s contents before sending them to module._compile.[1]

If you notice step #6 above, you’ll see that module.exports is returned to the user. This is why you use exports and module.exports when defining your public interface, since that’s exactly what Module._load and then require will return. I was surprised that there wasn’t more magic going on here, but if anything that’s for the better.

module._compile

1
2
3
4
5
6
7
Module.prototype._compile = function(content, filename) {
  // 1. Create the standalone require function that calls module.require.
  // 2. Attach other helper methods to require.
  // 3. Wraps the JS code in a function that provides our require,
  //    module, etc. variables locally to the module scope.
  // 4. Run that function
};

This is where the real magic happens. First, a special standalone require function is created for that module. THIS is the require function that we are all familiar with. While the function itself is just a wrapper around Module.require, it also contains some lesser-known helper properties and methods for us to use:

  • require(): Loads an external module
  • require.resolve(): Resolves a module name to its absolute path
  • require.main: The main module
  • require.cache: All cached modules
  • require.extensions: Available compilation methods for each valid file type, based on its extension

Once require is ready, the entire loaded source code is wrapped in a new function, which takes in require, module, exports, and all other exposed variables as arguments. This creates a new functional scope just for that module so that there is no pollution of the rest of the Node.js environment.

1
2
3
(function (exports, require, module, __filename, __dirname) {
  // YOUR CODE INJECTED HERE!
});

Finally, the function wrapping the module is run. The entire Module._compile method is executed synchronously, so the original call to Module._load just waits for this code to run before finishing up and returning module.exports back to the user.

Conclusion

And so we’ve reached the end of the require code path, and in doing so have come full circle by creating the very require function that we had begun investigating in the first place.

If you’ve made it all this way, then you’re ready for the final secret: require('module'). That’s right, the module system itself can be loaded VIA the module system. INCEPTION. This may sound strange, but it lets userland modules interact with the loading system without digging into Node.js core. Popular modules like mockery and rewire are built off of this.[2]

If you want to learn more, check out the module.js source code for yourself. There is plenty more there to keep you busy and blow your mind. Bonus points for the first person who can tell me what ‘NODE_MODULE_CONTEXTS’ is and why it was added.


[1] The module._compile method is only used for running JavaScript files. JSON files are simply parsed and returned via JSON.parse()

[2] However, both of these modules are built on private Module methods, like Module._resolveLookupPaths and Module._findPath. You could argue that this isn’t much better…

Recently I’ve noticed a lack of resources on advanced Node.js topics. There are plenty of guides and tutorials for getting started, but very little is written on maintainable design or scalable architecture. This post is a part of The Node.js Handbook, a series created to address this gap by sharing best practices. You can read more here.

Testing can be a tricky topic no matter what language we’re in. Javascript’s flexibility makes it easy to get started, but can leave us tearing our hair out days later. How do we handle an API callback? How do we deal with require? Without a proper setup, whether TDD is dead or not will end up meaning very little.

This post will explain the tools needed to overcome the challenges of testing with Node.js. Together, they form an essential testing suite that will cover almost any project. The setup isn’t the most complex or feature-rich, but you could almost say that’s on purpose. If that sounds counter-intuitive… read on.

Introduction: Zero Points for Clever Tests

Before introducing the tools, it’s important to emphasis the reason we write tests in the first place: confidence. We write tests to inspire confidence that everything is working as expected. If something breaks we want to be sure we’ll catch it, and quickly understand what went wrong. Every line of every single test file should be written for this purpose.

The problem is that modern frameworks have gotten incredibly clever. This is ultimately a good thing, but it means we’ll need to be careful: this extra power is easily gained at the expense of clarity. Our tests may run faster or have more reusable code, but does that make us more or less confident in what is actually being tested? Always remember: There are no points for clever tests.

Test clarity should be valued above all else. If our framework obfuscates this in the name of efficiency or cleverness, then it is doing us a disservice.

The Essential Toolkit

With that out of the way, lets introduce the four types of tools needed for successful Node.js testing:

  • A Testing Framework (Mocha, Vows, Intern)
  • An Assertion Library (Chai, Assert)
  • Stubs (Sinon)
  • Module Control (Mockery, Rewire)

A Testing Framework

The first and most important thing we’ll need is a testing framework. A framework will be our bedrock, providing a clear and scalable structure for our tests. We have a ton of options here, each with a different feature set and design. No matter which framework you go for, make sure you chose one that supports our mission: writing clear, maintainable tests.

For Node.js, Mocha is the gold standard. It has been around forever, and is well tested and maintained. Its customization options are extensive, which makes it incredibly flexible as well. While the framework is far from sexy, its setup/teardown pattern encourages explicit, understandable, and easy-to-follow tests.

1
2
3
4
5
6
7
8
9
10
  before(function(){
    // before() is the first thing we run before all your tests. Do one-time setup here.
  });
  it('does x when y', function(){
    // Now... Test!
  });
  after(function() {
      // after() is run after all your tests have completed. Do teardown here.
  });
});

An Assertion Library

With a new testing framework in place, we’re ready to write some tests. The easiest way to do that is with an assertion library.

1
assert(object.isValid, 'tests that this property is true, and throws an error if it is false');

There are a ton of different libraries and syntax styles available for us to use. TDD, BDD, assert(), should()… the list goes on. BDD has been gaining popularity recently thanks to its natural-language structure, but it should all come down to what feels best to you. Chai is a great library for experimenting because it supports most of the popular assertion styles. But if you’re a dependency minimalist, Node.js comes bundled with a simple assertion library as well.

1
2
expect(42).to.equal(42); // BDD Assertion Style
assert.equal(42, 42);    // TDD Assertion Style

Stubs

Unfortunately, assertions alone will only get us so far. When testing more complex functions, we’ll need a way to influence behavior and test code under explicit conditions. While it’s important to always stay true to the original behavior, sometimes we need to be certain that some function will return true, or that an API call will yield with an expected value. Sinon allows us to do this easily.

1
2
3
4
5
6
7
var callback = sinon.stub();
callback.withArgs(42).returns(1);
callback.withArgs(1).throws("TypeError");

callback();   // No return value, no exception
callback(42); // Returns 1
callback(1);  // Throws TypeError

Sinon includes a collection of other useful tools for testing, such as fake timers and argument matchers. In addition to Stubs, there are also Spies (a smaller subset of stub features for measuring function calls) and Mocks (for setting expectations on behavior) to experiment with. An entire book could be written on all of Sinon’s features, but you can always start simple and experiment as you go.

Module Control

We’re almost ready to start writing tests, but there’s still one last problem in our way: require(). Because most calls to require happen privately, we have no way to stub, assert, or otherwise access external modules. To really control our tests, we’ll need the option to override require to return modules under our control.

There are a few different ways to accomplish this, depending on how much power is needed. Mockery gives us control of the module cache, and lets us replace entries with modules of our own. It’s a cautious library, and will warn developers when it thinks they’ve done something unintentional, like overwriting or forgetting to replace certain mocks. Just be sure to disable & deregister mocks after the tests have run.

1
2
3
4
5
6
7
8
9
10
11
before(function() {
  mockery.enable();
  // Allow some ...
  mockery.registerAllowable('async');
  // ... Control others
  mockery.registerMock('../some-other-module', stubbedModule);
});
after(function() {
  mockery.deregisterAll();
  mockery.disable();
});

Rewire is another popular tool that is much more powerful than Mockery. We can get and set private variables within the file, inject new code, replace old functions, and otherwise modify the original file however we’d like. This may sound like a better deal, but with all this power comes the cliched responsibility. Just because we can check/set a private variable doesn’t mean we should. These additional abilities move our tested code farther away from the original, and can easily get in the way of writing good tests.

Bringing it all Together

To see these tools all working together check out a working example on GitHub. While I singled out a few favorite libraries above, there are plenty of good alternatives in each of the categories listed. Think I missed something important? Let me know in the comments, or on Twitter at @FredKSchott.

NodeUp Episode #61 is now available at nodeup.com. I had a blast talking with Forrest Norvell, Trevor Norris, Dan Peddle, and host Daniel Shaw about CLS, AsyncListeners, and what it all means for Node.js v0.12 and beyond.

Visit the site or grab the mp3 directly and hear a voice that was made for radio (but in the good way).

Recently I’ve noticed a lack of resources on advanced Node.js topics. There are plenty of tutorials on the basics, but you’ll find very little written on maintainable design or scalable architecture. I created The Node.js Handbook to address this gap. You can read all past topics here.

If the V8 Engine is the heart of your Node.js application, then your callbacks are its veins. They enable a balanced, non-blocking flow of control and processing power across applications and modules. But for callbacks to work at scale, developers needed a common, reliable protocol. The “error-first” callback (also known as an “errorback”, “errback”, or “node-style callback”) was first introduced to solve this problem, and has since become the standard protocol for Node.js callbacks. This post will define this pattern, it’s proper use, and exactly why it is so powerful.

Why Standardize?

Node’s heavy use of callbacks dates back to a style of programming older than Javascript itself. Continuation-Passing Style (CPS) is the old-school name for how Node.js uses callbacks today. In CPS, a “continuation” function is passed as an argument to be called once the rest of the code has been run. This allows for different functions to safely pass control asynchronously back and forth across an application.

Almost all Node.js code follows this style, so having a dependable callback pattern is crucial. Without one, developers would be stuck maintaining different signatures and styles for each and every callback. The error-first pattern was introduced into Node.js core to solve this very problem, and has since spread to become today’s standard. While every use-case has different requirements and responses, the error-first pattern can accommodate them all.

Defining an Error-First Callback

There’s really only one rule for using an error-first callback:

  1. The first argument of the callback is always reserved for an error object. The following arguments will contain any other data that should be returned to the callback. There is almost always just one object following ‘err’, but you can use multiple arguments if truely needed.
    Example: function(err, data)

When it’s time to call an error-first callback, there are two scenarios you’ll need to handle:

  1. On a successful response, the ‘err’ argument is null. Call the callback and include the successful data only.
    Example: callback(null, returnData);

  2. On an unsuccessful response, the ‘err’ argument is set. Call the callback with an actual error object. The error should describe what happened and include enough information to tell the callback what went wrong. Data can still be returned in the other arguments as well, but generally the error is passed alone.
    Example: callback( new Error('Bad Request') );

That’s it! Easy, right?

Implementing an Error-First Callback

1
2
3
4
fs.readFile('/foo.txt', function(err, data) {
  // File Returned
  console.log(data);
});

Above you have an error-first callback in action. readFile takes in a file path to read, and calls your callback once it has finished. If all goes well, the file data is returned in the data argument. But if somethings goes wrong, readFile won’t have any data to return. In that case, the first argument will be populated with an error.

1
2
3
4
5
6
7
fs.readFile('/foo.txt', function(err, data) {
  if (err) {
      console.log('Ahh! An Error!');
      return;
  }
  console.log(data);
});

It’s up to you to check for that error, and handle it as best it can. The error object will usually contain some extra information describing what went wrong, which you can use to detect and safely handle all possible problems.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
fs.readFile('/foo.txt', function(err, data) {
  if(err) {
      if(err.fileNotFound) {
          console.log('File Doesn't Exist');
         return;
     }   
     if(err.noPermission) {
         console.log('No Permission');
         return;
     }
     console.log('Unknown Error');
      return;
  }
  console.log(data);
});

Unfortunately the information isn’t normally as explicit as .fileNotFound, but handling errors properly is always worth the investment. Ignoring an error entirely can be disastrous for your application. Don’t cut corners on this.

Err-ception: Propagating Your Errors

The error-first pattern is the backbone for safe communication between Node.js modules. When a method passes its errors to a callback, it no longer has to make assumptions on how an error should be handled. Instead of choosing between throwing or ignoring, errors get handled at the level that makes sense.

When you’re consistent with this pattern, errors can be propagated up as as many times as you’d like. This is especially important when throwing isn’t possible, like within a web application. If some method threw an error while attempting to read a file the whole server would crash. By propagating it up the callback has a chance to handle the error. And if that specific route doesn’t know to handle it, it can propagate up even farther to be handled with a 500 or 404 error page.

1
2
3
4
5
6
7
8
if(err) {
  // Handle "Not Found"
  if(err.fileNotFound) return res.send('File Does not Exist');
  // Handle "No Permission"
  if(err.noPermission) return res.send('No Permission');
  // Propogate Other Errors, Express Will Catch Them
  return next(err);
}

Slow Your Roll, Control Your Flow

With a solid callback protocol in hand, there are endless possibilities for controlling your asynchronous flow. There are parallel flows, waterfall flows, and many more creative ways to direct asynchronous code. If you want to read in 10 different files, or make 100 different API calls, you no longer have to make them one at a time.

The async library can handle all this and more. And because it uses error-first callbacks, it’s incredibly easy to work with. Async can simplify and speed up your code, and hopefully keep you out of callback hell….

Bringing it all Together

To see all these concepts come together, check out some more examples on Github. And of course, you can always choose to ignore all of this callback stuff and go fall in love with promises… but that’s a whole other post entirely :)

Whether you love terminal customization or feel more Rand Paul with your ‘.bash_profile’, this tip will change your life. This is one of the first commands I run on a new machine, and it has been an invaluable tool for as long as I can remember. 

git config --global alias.lg "log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr)%C(bold blue)<%an>%Creset' --abbrev-commit"

― via Filipe Kiss

This command creates a new ‘git lg’ alias to use instead of the default ‘git log’. What does running git lg do exactly? It turns this mess:

git log example

into this:

git lg example

Beautiful, isn’t it? You get more information above the fold by removing the noise. And as an added bonus, you can see your branches ebb and flow across your repository. This can be super helpful for navigation and rebasing, and invaluable when used in merge-based workflows like GitHub’s.

#lifehackz.


Follow @FredKSchott for more discussion and git tips. Got any favorites of your own? I’d love to see them. Leave a comment below and spread the productivity/joy.