Here is the class on gist
https://gist.github.com/2605302
I have tested it multiple times with different files and even when there is less comparisons done for binary search the time taken is ALWAYS more. What’s going wrong?
public static int linerSearch ( String array [], String word, long resultsArray [])
{
int comparisons = 0;
int pos = -1;
//i have started the timer where the search actualy starts
long start = System.nanoTime ();
for (int i = 0; i < array.length; i++){
comparisons = comparisons + 1;
if (array [i].equals (word)){
pos = i;
break;
}
}
long stop = System.nanoTime ();
long total = stop - start;
resultsArray [0] = total;
resultsArray [1] = (long) (long) array.length;
resultsArray [2]= (long) (long) comparisons;
return pos;
}
Here is the next binarySearch class
public static int binarySearch (String [] array, String word, resultsArray []) {
int start = 0;
int end = array.length - 1;;
int midPt;
int pos = -1;
int comparisons2 = 0;
long start2 = System.nanoTime ();
Arrays.sort (array);
while (start <= end) {
midPt = (start + end) / 2;
comparisons2 = comparisons2 + 1;
if (array [midPt].equalsIgnoreCase (word)) {
pos = midPt;
break;
}
else if (array [midPt].compareToIgnoreCase (word) < 0) {
start = midPt + 1;
comparisons2 = comparisons2 + 1;
//camparisons2 addition was added inside this elseif and other elseif as a work around for not breaking the elseif statement tree, if it has made it two the last elseif then two camparisons after the first one will have been done
} else if (array [midPt].compareToIgnoreCase (word) > 0) {
comparisons2 = comparisons2 + 2;
end = midPt - 1;
}
}
long stop2 = System.nanoTime ();
long total2 = stop2 - start2;
resultsArray [0] = total2;
resultsArray [1] = (long) (long) array.length;
resultsArray [2]= (long) (long) comparisons2;
return pos;
}
edit: I should also add that i tried it on an already previously sorted array without that line of code and it was still a longer time when it shouldn’t have been
Okay, I’ve got this worked out for you once and for all. First, here’s the binary search method as I used it:
You’ll notice that I reduced the number of comparisons by saving the comparison result and using that.
Next, I downloaded this list of 235882 words. It is already sorted ignoring the case. Then, I built a test method that loads the contents of that file into an array and then uses both of those searching methods to find every word of that list. It then averages the times and numbers of comparisons for each method separately.
I found out that you must be careful in choosing which comparison methods to use: if you
Arrays.sort(...)a list and you usecompareToIgnoreCasein binary search, it fails! By failing I mean that it cannot find the word from the given list even though the word actually exists there. That is becauseArrays.sort(...)is a case-sensitive sorter for Strings. If you use that, you must use thecompareTo(...)method with it.So, we have two cases
compareToIgnoreCasecompareToIn addition to these options in the binary search, you also have options in the linear search: whether to use
equalsorequalsIgnoreCase. I ran my test for all of these cases and compared them. Average results:equals: time: 725536 ns; comparisons: 117941; time / comparison: 6.15 nsequalsIgnoreCase: time: 1064334 ns; comparisons: 117940; time / comparison: 9.02 nscompareToIgnoreCase: time: 1619 ns; comparisons: 16; time / comparison: 101.19 nscompareTo: time: 763 ns; comparisons: 16; time / comparison: 47.69 nsSo, now we can clearly see your problem: the
compareToIgnoreCasemethod takes some 16 times as much time as theequalsmethod! Because, on average, it takes the binary search 16 comparisons to find the given word, you can perform 124 linear comparisons in that time. So if you test with word lists shorter than that, the linear search is, indeed, always faster than the binary search due to the different methods they are using.I actually also counted the number of words that the linear search was able to find faster than the binary search: 164 when using the
compareTomethod and 614 when using thecompareToIgnoreCasemethod. Of the the list of 235882 words, that’s about 0.3 percent. So all in all I think it’s still safe to say that the binary search is faster than the linear search. 🙂One last point before you ask: I removed the sorting code from the
binarySearchmethod, because that’s actually an entirely different thing. Since you are comparing two searching algorithms, it’s not fair for the other if you add the cost of a sorting algorithm to its figures. I posted the following as a comment in another answer already, but I’ll copy it here for completeness:Binary search has the added overhead cost of sorting. So if you only need to find one element from an array, linear search is always faster, because sorting takes at least O(n log n) time and then a binary search takes O(log n) time, dominated by the O(n log n) operation. A linear search performs in O(n) time, which is better than O(n log n). But once you have the array sorted, O(log n) is way better than O(n).
If you insist on having the sorting command in the
binarySearchmethod, you should be aware that with my setup sorting that long list of words from an initially random order takes more than 140000000 ns, or 0.14 seconds, on average. In that time you could perform some 23000000 comparisons using theequalsmethod, so you really should not use binary search if a) your array is in a random order and b) if you only ever need to find just one or a couple of elements from there.And one more thing. In this example, where you are searching for words in a String array, the cost of accessing an item in the array is negligible because the array is saved in the fast main memory of the computer. But if you had, say, a huge bunch of ordered files and you tried to find something from them, then the cost of accessing a single file would make the cost of every other calculation negligible instead. So binary search would totally rock in that scenario (too).