I’m currently investigating issues on the following system:
- 3.2 GHz 8-core machine, 24 GB ram
- Debian 6.0.2
- ulimit -n 4096
- ulimit -Sn 4096
- ulimit -Hn 65535
- Tomcat 6.0.28
- -Xmx20g
- MySQL 5.0.51a (through hibernate and a few manual JDBC queries)
- also pretty much room for caching
I’m testing the most common requests to the server with 2000 requests per minute remotely. Testing tool is latest jMeter. The average response time is around 65 ms, min is 35 and max is 4000ms (in rare cases, but has it’s reason).
As far as I watched htop, the system specs are sufficient for at least 3 times more request per Minute. (Avg. CPU: 25%, RAM: 5 of 22GB) The server itself is accessible all the time. (Pinging it constantly while running the test.)
Important is the fact, that each request results in 3 additional requests to the local tomcat where the second finally gets the required data and the last is for statistics:
jMeter(1) -> RESTeasy-Service(2) -> ?-Service(2) -> Data-Service(2) -(new Thread)> Statistic-Service(2)
(1) is my jMeter test server and distant from (2), which is the tomcat server. Yes, the architecture might be a little weird, but that’s not my fault. ^^
I switched the thread management to pool in server.xml. Set 1000 max threads up from default 200 and 10 idle up from 4. What I noticed is that the number of concurrent threads as good as never decreases, instead steadily rises up to tomcat’s max it seems. htop reports 160 Threads while tomcat is stopped. About 460 when it’s started freshly. (Services seem to start a few…) After a few hours (sometimes less) of hitting the server with 2000 requests per minute htop says there are 1400 tasks. This seems to be the point when I start to get timeouts in jMeter. As this is extremely time consuming I did not watch it a thousand times and therefore can’t garantuee this is the cause, but that’s pretty much what happens.
Primary questions:
-
Math tells me that the concurrently used thread count should never ever exceed about 600. (34 requests * 4 requests * 4 seconds = 544, even less, but estimated 600 should be fine). As far as I understand the idea of thread pooling, unused threads should be released and stopped when idle for too long. Is there still a way I could get a thousand idling(?) threads? And is this ok?
-
Could a thread started manually in one of the request processors deny the tomcat threads to be released?
-
Shouldn’t there be any log message telling me that tomcat could not create/fetch a thread for a request?
-
Any other ideas? I’m working on this for far too long and now tomcat exhausting it’s thread pool seems the only valid reason for these weird timeouts. But maybe somebody has another hint.
Thanks in advance especially if you can finally save me from this…
After hours and days of mind-blowing I found that the timeouts happen when Tomcat reaches it’s thread limit while we’re in the middle of those 3 local connection openings. I guess if it once reaches that limit one thread is waiting for another to open which will not happen while the previous do not close. In German I’d call that Teufelskreis. ^^
Whatever, solution was raise max threads to a ridiculous high number:
<Executor name="tomcatThreadPool" namePrefix="catalina-exec-" maxThreads="10000" minSpareThreads="10"/>I know that this should not be the way to go, but unfortunately we all here know that our architecture is somewhat impractical and nobody got the time to change something about it.
Hope it helps somebody. =)