I am sorting big file with by reading into chunks (Arraylist), sorting each arraylist using Collections.sort with custom comparator and writing the sorted results into files and then applying merge sort algorithm on all files.
I do it in one thread.
Will I get any performance boost if I start a new thread for every Collections.sort()?
By this I mean the following:
I read from file into List, when List is full I start a new thread where I sort this List and write to temp file.
Meanwhile I continue to read from file and start a new thread when the list is full again…
Another question that I have:
What is better for sorting:
1)Arraylist that I fill and when it’s full apply collections.sort()
2)TreeMap that i fill, I don’t need to sort it. (it’s sorts as I insert items)
NOTE: I use JAVA 1.5
UPDATE:
This is a code I want to use, the problem are that I am reusing datalines arraylist that is beeing used by threads and also I need to wait until all threads complete.
how do i fix?
int MAX_THREADS = Runtime.getRuntime().availableProcessors();
ExecutorService executor = Executors.newFixedThreadPool(MAX_THREADS);
List datalines = ArrayList();
try {
while (data != null) {
long currentblocksize = 0;
while ((currentblocksize <= blocksize) && (data = getNext()) != null) {
datalines.add(data);
currentblocksize += data.length();
}
executor.submit(new Runnable() {
public void run() {
Collections.sort(datalines,mycomparator);
vector.add(datalines);
}
});
I suggest you to implement the following scheme, known as a farm:
Thus, one thread reads a chunk from the file, hands it to a worker thread (best practice is to have the workers as an
ExecutorService) to sort it and then each worker sends their output to the writer thread to put in a temp file.Edit: Ok, I’ve looked at your code. To fix the issue with the shared
datalines, you can have a private member for each thread that stores the currentdatalinesarray that the thread needs to sort:You also need to synchronize access to the shared
vectorcollection.Then, to wait for all threads in the ExecutorService to finish use: