Implementing the basic algorithm using last array as a pivot in Java, is it normal for it take 5 hours for sorting a 100,000,000 element array of random numbers?
My system Specs:
Mac OS X Lion 10.7.2 (2011)
Intel Core i5 2.3 GHz
8GB ram
Update2: So I think I am doing something wrong in my other methods since Narendra was able to run the quicksort. Here is the full code I am trying to run.
import java.util.Random;
public class QuickSort {
public static int comparisons = 0;
public static void main(String[] args) {
int size = 100000000;
int[] smallSampleArray = createArrayOfSize(size);
System.out.println("Starting QS1...");
long startTime = System.currentTimeMillis();
quickSort(smallSampleArray,0,size-1);
System.out.println( "Finished QS1 in " + (System.currentTimeMillis() - startTime)+ " seconds");
System.out.println("Number of comparisons for QS1: " + comparisons);
}
public static int[] createArrayOfSize(int arraySize) {
int[] anArray = new int[arraySize];
Random random = new Random();
for(int x=0; x < anArray.length; x++ ) {
anArray[x] = random.nextInt(1000) + 1;;
}
return anArray;
}
public static void quickSort( int anArray[], int position, int pivot) {
if( position < pivot ) {
int q = partition(anArray, position, pivot);
quickSort(anArray, position, q-1);
quickSort(anArray, q+1, pivot);
}
}
public static int partition(int anArray[], int position, int pivot ) {
int x = anArray[pivot];
int i = position - 1;
for(int j = position; j < (pivot-1); j++ ) {
comparisons++;
if(anArray[j] <= x) {
i = i + 1;
int temp = anArray[i];
anArray[i] = anArray[j];
anArray[j] = temp;
}
}
int temp = anArray[i+1];
anArray[i+1] = anArray[pivot];
anArray[pivot] = temp;
return i+1;
}
}
I’ve moved the old, now irrelevant answer to the end.
Edit x2
Aha! I think I’ve found the cause of your horrible performance. You told us you were using randomized data. That is true. But what you didn’t tell us is that you were using such a small range of possible random values.
For me, your code is very performant if you change this line:
to this:
That goes against expectations, right? It should be cheaper to sort a smaller range of values, since there should be less swaps we need to do, right? So why does this happen? This happens because you have so many elements with the same value (on average, 100 thousand). So why does this lead to such horrible performance? Well, say at each point you chose a perfect pivot value: exactly halfway. Here’s what it would look like:
And so on. However (and here’s the critical part) you would eventually get to a partition operation where every single value is equal to the partition value. In other words, there will be a a big (100 thousand big) block of numbers with the same value that you will try to recursively sort. And how will that happen? It will recurse 100 thousand times, only removing the single pivot value at each level. In other words, it will partition everything to the left or everything to the right.
Expanding on the breakdown above, it would look kind of like this (I’ve used 8–a power of 2–for simplicity, and forgive the bad graphical representation)
If you want to counter this, you need to optimize your code to reduce the effects of this. More on that to come (I hope)…
…and continued. An easy way to solve your problem is to check if the array is already sorted at each step.
Add that and you won’t recurse unnecessarily and you should be golden. In fact, you get better performance than you do with values randomized over all 32 bits of the integer.
Old answer (for posterity only)
Your partitioning logic looks really suspect to me. Let’s extract and ignore the swap logic. Here’s what you have:
I fail to see how this would work at all. For example, if the very first value were less than the pivot value, it would be swapped with itself?
I think you want something like this (just a rough sketch):
Edit
I think I understand the original partitioning logic now (I had confused the if-block to be looking at elements greater than the pivot). I’ll leave my answer up on the off chance that it delivers better performance but I doubt it would make a significant difference.