Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9014797
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T03:35:56+00:00 2026-06-16T03:35:56+00:00

I am running a fairly large-scale Node.js 0.8.8 app using Cluster with 16 worker

  • 0

I am running a fairly large-scale Node.js 0.8.8 app using Cluster with 16 worker processes on a 16-processor box with hyperthreading (so 32 logical cores). We are finding that since moving to the Linux 3.2.0 kernel (from 2.6.32), the balancing of incoming requests between worker child processes seems be heavily weighted to 5 or so processes, with the other 11 not doing much work at all. This may be more efficient for throughput, but seems to increase request latency and is not optimal for us because many of these are long-lived websocket connections that can start doing work at the same time.

The child processes are all accepting on a socket (using epoll), and while this problem has a fix in Node 0.9 (https://github.com/bnoordhuis/libuv/commit/be2a2176ce25d6a4190b10acd1de9fd53f7a6275), that fix does not seem to help in our tests. Is anyone aware of kernel tuning parameters or build options that could help, or are we best-off moving back to the 2.6 kernel or load balancing across worker processes using a different approach?

We boiled it down to a simple HTTP Siege test, though note that this is running with 12 procs on a 12-core box with hyperthreading (so 24 logical cores), and with 12 worker processes accepting on the socket, as opposed to our 16 procs in production.

HTTP Siege with Node 0.9.3 on Debian Squeeze with 2.6.32 kernel on bare metal:

reqs pid
146  2818
139  2820
211  2821
306  2823
129  2825
166  2827
138  2829
134  2831
227  2833
134  2835
129  2837
138  2838

Same everything except with the 3.2.0 kernel:

reqs pid
99   3207
186  3209
42   3210
131  3212
34   3214
53   3216
39   3218
54   3220
33   3222
931  3224
345  3226
312  3228
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T03:35:57+00:00Added an answer on June 16, 2026 at 3:35 am

    Don’t depend on the OS’s socket multiple accept to balance load across web server processes.

    The Linux kernels behavior differs here from version to version, and we saw a particularly imbalanced behavior with the 3.2 kernel, which appeared to be somewhat more balanced in later versions. e.g. 3.6.

    We were operating under the assumption that there should be a way to make Linux do something like round-robin with this, but there were a variety of issues with this, including:

    • Linux kernel 2.6 showed something like round-robin behavior on bare metal (imbalances were about 3-to-1), Linux kernel 3.2 did not (10-to-1 imbalances), and kernel 3.6.10 seemed okay again. We did not attempt to bisect to the actual change.
    • Regardless of the kernel version or build options used, the behavior we saw on a 32-logical-core HVM instance on Amazon Web services was severely weighted toward a single process; there may be issues with Xen socket accept: https://serverfault.com/questions/272483/why-is-tcp-accept-performance-so-bad-under-xen

    You can see our tests in great detail on the github issue we were using to correspond with the excellent Node.js team, starting about here: https://github.com/joyent/node/issues/3241#issuecomment-11145233

    That conversation ends with the Node.js team indicating that they are seriously considering implementing explicit round-robin in Cluster, and starting an issue for that: https://github.com/joyent/node/issues/4435, and with the Trello team (that’s us) going to our fallback plan, which was to use a local HAProxy process to proxy across 16 ports on each server machine, with a 2-worker-process Cluster instance running on each port (for fast failover at the accept level in case of process crash or hang). That plan is working beautifully, with greatly reduced variation in request latency and a lower average latency as well.

    There is a lot more to be said here, and I did NOT take the step of mailing the Linux kernel mailing list, as it was unclear if this was really a Xen or a Linux kernel issue, or really just an incorrect expectation of multiple accept behavior on our part.

    I’d love to see an answer from an expert on multiple accept, but we’re going back to what we can build using components that we understand better. If anyone posts a better answer, I would be delighted to accept it instead of mine.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm using Eclipse 3.4 with WTP 3.0.2 and running a fairly large Dynamic Web
We're running a fairly complex app as a portlet on Websphere Portal Server 5.1
So I currently have an application that is fairly complex with large-scale animations and
I am working on a fairly large MVC 3 application, and I'm running into
I have a Python program that processes fairly large NumPy arrays (in the hundreds
We're using Maven/Surefire and Spring/Hibernate transactional tests for a fairly large web application. There
We're using EF4 in a fairly large system and occasionally run into problems due
I'm running a fairly large web application which contains an increasing amount of JQuery
I'm with a fairly large team and we are running into problem with the
We've got a fairly large app that's going up on heroku... It's an app

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.