I think I’m using threads wrong so I wanted to ask if this was good design. Basically I have a program that pulls data from a queue then processes it(processing is pure math so its 100% cpu intensive), then if the data is good its send to a ‘good’ queue otherwise its either completely discarded or part of it is sent back to the initial ‘work’ queue for further processing. That’s the high level logic of it and when my queue was in-memory my program used all cores and was really fast. As my data grows I decided to use a queue server to store the queue and then distribute the processing over multiple machines and now its slow(only 40%-60% of each core is being used).
I tried to profile my code(using yourkit and the built in one in netbeans) and it says most of the time(80%) is spent on the queue program. I thought I can maybe keep the number crunching going constantly in my program by pushing all external program stuff to another thread, but its not helping with performance and I’m wondering if I’m dong it wrong. I am not sure, but I’m wondering if a launch a thread(child-thread) from an existing thread(parent-thread), does the child one have to be complete before the parent one can finish?
My code is quite large and 99% is not needed, so I’ll just write a high level version of it(it may not compile but should give you an idea of what I’m doing).
public class worker {
private static ExecutorService executor;
static {
final int numberOfThreads = 4;
executor = new ThreadPoolExecutor(numberOfThreads, numberOfThreads, 1000, TimeUnit.SECONDS, new LinkedBlockingDeque<Runnable>());
}
public static void main(String[] args) throws IOException, ShutdownSignalException, ConsumerCancelledException, InterruptedException {
// TODO Auto-generated method stub
System.out.println("starting worker..");
//Connection information goes here
channel.basicQos(50); //this is part of the connection, the queue server only gives 50 messages without acknowledgment
while (true) {
QueueingConsumer.Delivery delivery = consumer.nextDelivery(); //gets data from queue
String message = new String(delivery.getBody());
executor.submit(new DoWork(channel, message, delivery));
}
class DoWork implements Runnable{ //where the main work happens
//setup variables, basically passing queue connection information as well as data here, so I only need to rely on one connection
public void run() {
new Thread(new Awk_to_Queue(channel, delivery)).start(); //this sends an Awk_to_Queue to the queue, I launch a thread for this so my program can keep working.
if (data is good) {
new Thread(new Send_to_Queue("success_queue", message1, channel)).start();
continue;
} else if (Data is not good but not bad either ) {
new Thread(new Send_to_Queue("task_queue", message2, channel)).start();
}
class Send_to_Queue implements Runnable{
public void run() {
//takes data in and sends to queue in the way I used to previous do it, but just does it in a thread. queue connection is passed over so I only need to have one connection
}
}
class Awk_to_Queue implements Runnable{
public void run() {
//awk's queue so queue server can send one more piece of data to queue up
}
}
There it is. I’m sorry if its a bit hard to read(I deleted a lot of stuff just to show you structure of what I’m doing). What am I doing wrong that forking threads did not affect speed(It don’t see it going faster nor is the profiler’s results changed)? Is the problem with the way I’m forking threads ( new Thread(new Awk_to_Queue(channel, delivery)).start();) or is it something else like my design of threading?
Two things come to mind:
1) The only thread reading the remote queue appears to be the main thread running your infinite loop in the main() method. However fast you stuff things into it, you’ll never process them faster than you can take them back out.
2) Spawning
new Thread();s is an “expensive” operation. Constantly creating new threads for single short tasks is just churning through memory allocation and native resources. You should offload those “queue puts” to a secondExecutorServicethat you can tune the size of, rather than spawning an unbounded number of threads.