I have a multi-threaded application which scales well to begin with, but running on a 16-cpu server, once I exceed 5 or 6 hardware threads the performance levels off. I suspect that the bottleneck surrounds one of the synchronized methods. However, I need to be sure it’s the guilty method before I start diving into the code and trying to replace the algorithm with a non-blocking one.
Running Java with the -Xprof argument tells me that, as I expected, the threads are spending most of their time blocked. Is there a way that I can break that down into how much time they spend blocked at a particular method?
http://yourkit.com the monitor view will tell you which lock classes are hot, who is holding the contended locks and breakdown by lock instance and caller stack. There is 30 day evaluation period of the tool.