This blog will review cluster environment setup to handle node multi-threaded service instances. In our last post, we reviewed the methods of using pooled connections for MySQL connector in node.js on a Linux server running on an Amazon AWS EC2 instance. The conclusion was that it is necessary from the beginning of the application development process to write javascript back-end application server code that uses MySQL pooling methods. This is a significant change in the application code from the non-pooled connection method API. It is important to understand the techniques you intend to use before setting down to write your production code for a back-end application server because it’s important to write your application once and do it right the first time.

Now let’s take a look at clusters. This is a built-in part of the node.js methods. (We are not talking about the clusters add-on module from npmjs.com that goes by the same name.) Using the built in clusters methods for node.js is described in multiple blog posts here and here. The documentation and posts describe setting up a cluster.js that runs a master and fork process servers that run your app.js code multiple times utilizing multiple core processor threads. This increases throughput and creates a multi-threaded environment. The simplified node.js cluster environment code is shown here.

var cluster = require('cluster');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {

//  for (var i = 0; i < numCPUs; i++){
  for (var i = 0; i < 10; i++){
    cluster.fork();
  }

  cluster.on('exit',function(worker, code, signal) {
    console.log('worker ' + worker.process.pid + ' died');
  });
} else {

    //change this line to Your Node.js app entry point.
    require("./app.js");
}

In simplified terms, the code above creates multiple forks using a Master node cluster to run multiple instances (multi-threaded) of your node.js application. We did some testing on our Amazon AWS EC2 t2.micro (free tier) and using our async-test.js test, from our prior blog post, using an external RDS MySQL database with pooled MySQL connections. We used Siege of up to 500 concurrent connections and hit it constantly for 1 minute resulting in thousands of hits. What we found is a single process environment resulted in database connection errors while a cluster run environment of 10 processes (multi-threaded) running the same async-test resulted in no connection errors. This proves a cluster multi-threaded environment is important.

When you look at my code above you can see I altered the code to force 10 process forks (instead of using numCPUs.) Remember I am running this on a virtual shared server using Amazon AWS. When using the os.cpus ().length the os method reports back 1 core process thread.  This is the number of process threads available as designed by AWS. But a little digging into the full array results of the os.cpus () method reveals my AWS t2.micro is actually running on a XEON E5 2670 2.5ghz processor with 10 cores and 20 threads. If you did this on a single dedicated machine running that processor you would get a result of 20 for the os.cpus ().length method. Actually, Amazon does a lot of behind the scenes stuff to throttle the processes. But getting more than what you pay for is not at issue here. (Even though you may be getting it all for free through the free tier.) The issue is a working production application design that doesn’t fail at critical points like peak database traffic. So what we found is the processor even on the AWS t2.micro (free tier) was able to better handle the traffic on multiple clustered processes and it could handle a huge amount of traffic. One seige test resulted in no connection errors of over 8000 hits in 30 seconds. Since it is a shared server it is doubtful 20 threads would be useful. It’s interesting to note Amazon actually defines the t2.micro with 1 vCPU and 6 CPU credits/hour which is a throttle mechanism and different from actual core processing threads. Amazon uses processor credits to ensure you get what you pay for or to throttle your application as needed. As in life, you only get what you pay for! 🙂

But do we write code and change our application design to handle the cluster methods. No! We don’t change our code to use the cluster methods. There is a better way and it’s all handled for us using the PM2 module from npmjs.com. No need to create code like the cluster.js sample above. This multi-threading cluster environment, base node system control, monitoring, and performance optimization is likely a place where you don’t need to reinvent the wheel, there is already significant well-designed products to handle these functions. To start, just install the PM2 module with

npm install -g pm2

A useful additional tool in connection wth PM2 is the https://app.keymetrics.io dashboard monitor. You can go there and create a bucket and server connection to your PM2 server to generate metrics data and external monitoring and control. Use the PM2 documentation for all the PM2 commands but some of the useful commands are

Pm2 start app.js

Pm2 stop all

Pm2 start app.js -i 4  # to start 4 cluster instances of your app.js

Pm2 list  # to list all running process threads

There is another module called forever but we found PM2 is much more advanced, robust, and well supported.

One final item to do is setup PM2 to run on startup. Do the following command which creates a line item to copy and paste to your terminal window. The one line of code it creates will automatically start PM2 on reboot.

pm2 startup

What we do need to do in our application design is write good strong well-designed code that handles multiple instances running and save session or environment variables that can be accessed by all cluster process instances in a multi-threaded environment. In a later post we will cover using Redis as a global external store for process variables. As well codeforgeek provides a great tutorial. This is the real important part of application development. The two most important parts that must be designed to handle multi-threaded clusters and multi-instances are session like variables and database connection transactions. For example, three related SQL inserts or Redis store SETS must complete before another process tries to select that same and related set of data.

In conclusion, I recommend installing pm2 and use it from the start of the agile application development process. An added benefit of using pm2 in development is added logging and debugging methods.

Advertisements