The first thing we need to know about Node.js is that it is Platform, not a Framework. It contains many components to develop, test and deploy Enterprise applications. It’s an open-source, cross-platform JavaScript runtime.
The Node.js project began in 2009 as a JavaScript environment decoupled from the browser. Using Google’s V8 and Marc Lehmann’s libev, Node.js combined a model of event-driven I/O with a language (JavaScript) that was well suited to the style of programming (since developers already manage event-driven programming in browsers).
Many Node.js developers (if not most) already know JavaScript before learning to use Node.js. Consequently, they often start using node using components and libraries such as Express.js, Sequelize, Mongoose, Socket.IO and other well-known libraries instead of investing their time in learning Node.js itself and as well as its standard APIs.
However, the Node.js stack comprises a lot more, and knowing this and understanding its built-in APIs may help to avoid many common mistakes and improve execution performance. Any Node.js application contains the following components.
The application
Your application’s code, written in JavaScript, along with any custom JavaScript libraries.
JavaScript Modules and Libraries
Node.js has a set of built-in modules which you can use without any further installation, as well as many other modules that must be installed (usually using NPM, Node’s standard package manager). Applications can access these modules using JavaScript functions and class methods, depending on the module.
C/C++ bindings
Wrappers around C/C++ modules, built with Node API (N-API), a C API for building native Node.js addons and other APIs bindings. Node.js Addons are dynamically-linked shared objects, written in C++, that can be loaded into Node.js using the require() function, and used just as if they were an ordinary Node.js module. They are used primarily to provide an interface between JavaScript running in Node.js and C/C++ libraries.
Google V8 JavaScript Engine
Google’s V8 JavaScript engine is the JavaScript engine inside of Node.js that parses and runs your JavaScript. The same V8 engine is used inside of Chrome to run JavaScript in the Chrome browser. Google open-sourced the V8 engine and the builders of Node.js used it to run JavaScript in Node.js.
Note: There is an effort by Microsoft to allow the Chakra JavaScript engine (that’s the engine in Edge) to be used with Node.js but it’s still in an experimental stage.
libuv (Unicorn Velociraptor Library)
A multi-platform C library that provides support for asynchronous I/O based on event loops. It was primarily developed for use by Node.js, but it’s also used in other tools such as Julia, Luvit, pyuv, and others. Node.js uses this library to abstract I/O operations to a unified interface across all supported platforms.
This library provides mechanisms to handle file system, DNS, network, child processes, pipes, signal handling, polling and streaming.
Complementary low-level components, mostly written in C/C++
- c-ares: A C library for asynchronous DNS requests, which is used for some DNS requests in Node.js.
- http-parser: A lightweight HTTP request/response parser library.
- OpenSSL: A well-known general-purpose cryptography library. Used in tls and crypto modules.
- zlib: A lossless data-compression library. Used in zlib module.
- Some bundled tools that are used in Node.js infrastructure (among others):
- npm: A well-known package manager (and ecosystem).
- gyp: A python-based project generator copied from V8. Used by node-gyp, a cross-platform command-line tool written in Node.js for compiling native addon modules.
- gtest: Google’s C++ test framework. Used for testing native code.
Node.js under the hood
Most of Node.js developers know that it’s built on top of V8 and libuv, a multi-platform C library to support asynchronous I/O based using an Event-Loop and Worker Threads mixed architecture.
Here is a rather simplistic diagram showing how your JS code runs under the hood. Although it not does not show everything happening when you run a Node.js application, it does highlight the most important components of the runtime stack.
When your Node.js application starts, it first completes a startup phase (the start script), including requiring modules and registering callbacks for events. Once this step is completed, the application enters the Event Loop (a.k.a. the main thread, event thread and the like).
The Event Loop is implemented as a single threaded (while offloading operations to the system kernel whenever possible.) and semi-infinite loop using the libuv library. It’s called a semi-infinite loop because the loop ends at some point, when there is no more work left to be done. From the developer’s view, that’s the point when your program exits.
Note: Since most modern kernels are multi-threaded, they can handle multiple operations by executing them in the background. When one of these operations completes, the kernel notifies Node.js so that the appropriate callback may be added to the “poll queue” to eventually be executed.
Conceptually, the Event Loop is designed to respond to incoming client requests by executing the corresponding JS callback. JS callbacks are executed synchronously, but may use Node’s APIs to register asynchronous requests to continue processing after the callback completes.
The callbacks for these asynchronous requests will also be executed on the Event Loop. Examples of such Node APIs include various timers (setTimeout(), setInterval(), etc.), functions from fs and http modules and many more. All of these APIs require a callback that will be triggered once the original operation is completed.
Incoming events (either from the application itself or from pending callbacks) are stored in different events queues and executed. The order in which they’re executed is defined according to “phases”, as follows:
- timers: this phase executes callbacks scheduled by setTimeout() and setInterval().
- pending callbacks: executes I/O callbacks deferred to the next loop iteration.
- idle, prepare: only used internally.
- poll: retrieve new I/O events; execute I/O related callbacks (almost all with the exception of close callbacks, the ones scheduled by timers, and setImmediate()); node will block here when appropriate.
- check: setImmediate() callbacks are invoked here.
- close callbacks: some close callbacks, e.g. socket.on(‘close’, …).
Each phase has a FIFO (first-in, first-out) queue of callbacks to execute. While each phase is special in its own way, generally, when the event loop enters a given phase, it will perform any operations specific to that phase, then execute callbacks in that phase’s queue until the queue has been exhausted or the maximum number of callbacks has executed. When the queue has been exhausted or the callback limit is reached, the event loop will move to the next phase, and so on.
During the poll phase the Event Loop fulfills non-blocking, asynchronous requests (started via Node APIs) by using libuv’s abstractions for OS-specific I/O polling mechanisms. Each OS has its own library for this (epoll for Linux, IOCP for Windows, kqueue for BSD and MacOS, event ports in Solaris).
Among developers, it’s a common myth that Node.js is strictly single-threaded. Conceptually, it’s true as your JS code is always run on a single thread, within the Event Loop.
At the implementation level, libuv also includes a Thread Pool, also known as Worker Pool, used for offloading work (to the system kernel) for some things that either cannot be done asynchronously at the OS level or require too much processing, which would block the Event Loop.
The Worker Pool is a fixed-size thread pool, as shown on the diagram, so any Node.js process has multiple threads running in parallel. The reason for that is the following: not all Node API operations can be executed in a non-blocking fashion on all supported operation systems.
The Event Loop is not suited for CPU (encryption, compression, etc.) or I/O (DNS, FileSystem, etc.) intensive operations, so these are offloaded to the Worker Pool. This prevents blocking the Event Loop, improving performance and throughput.
Using a Thread Pool is more efficient than having multiple operations waiting to run on a single thread. It also helps to avoid the considerable overhead of creating and destroying a thread every time the runtime requires a worker thread.
What does all this mean for my Node.js application?
Now, after all this digression, you should have a much better understanding of Node.js overall architecture, let’s discuss some guidelines for writing higher-performance, more-secure server-side applications.
In terms of server-side web applications, e.g. RESTful services, all requests are processed concurrently within Event Loop’s single thread.
So, if processing of an HTTP request in your application takes significant amount of time on execution of a JS function (e.g performing a heavy calculation), it blocks the Event Loop for all other requests, making all requests slow.
Thus, the first golden rule of Node.js is “never block the Event Loop”. Here is a short list of recommendations that will help you to follow this rule:
- Avoid performing heavy calculations synchronously. If you have any code with time complexity worse than O(n), consider optimizing it as much as possible or, at least, split the calculations into chunks that are recursively called via a timer API, such as setTimeout() or setImmediate(). This way you will not be blocking the Event Loop and other callbacks will get processed.
- Avoid any *Sync calls, like fs.readFileSync() or crypto.pbkdf2Sync(), in server applications. The only exception to this rule might be startup phase of your application.
- Choose 3rd-party libraries wisely as they might blocking the event loop, e.g. by running some CPU-/I/O- intensive computations written in JS. Instead, aim at using libraries that rely on Worker Threads to perform these calculations.
On the other hand, you should also use the Worker Thread pool wisely. As mentioned, it is a fixed size thread tool with a default size of 4 threads, so if all available threads are in use, any requests waiting on the Task Queue will need to wait until one is freed. The pool size may be increased by setting a higher UV_THREADPOOL_SIZE variable, but in many cases, this won’t solve the problem.
In other words, and in line with the Event Loop usage rules, the second golden rule of Node.js would be “block the Worker Pool wisely”. This can be achieved by:
- Avoiding long-running tasks happening on Worker Pool. As an example, choose stream-based APIs over reading files with fs.readFile() (which takes more time and CPU+memory resources).
- Partitioning CPU-intensive tasks if possible.
- Once again, choose 3rd-party libraries wisely.
As final conclusion, we may say that “Node.js is fast if the work required for each request, at any given time, is small and simple enough”. This rule covers both Event Loop and Worker Pool and is essential for developing high-performance code.