I am not sure why List as a general data structure should have binary search algorithm given the list is sorted. Doesn’t the get method which accepts the index traverse the List sequentially, at least for List‘s subtype LinkedList? If so, I do not see any advantage of using binarySearch comparing with sequential comparison for LinkedList. Of course unless we restrict List to be ArrayList we can do binarySearch with more confidence.
Is my understanding correct? Thanks.
There are many ways to implement the
List. There’sArrayList,LinkedList,CopyOnWriteArrayList, etc. in the standard Java libraries, and a host of other implementations beyond those (VLists, circular buffers, skew binomial lists, extendible arrays, 2-3 finger trees, etc.). The idea behind providing binary search is that while not all List implementations support random access, the ones that do would benefit from having a generic implementation of binary search available so that the authors of each data structure don’t have to reimplement it from scratch. For example, if I implement a crazy new list structure that supports random access, if I implement the List interface I can automatically get a binary search available from the Collections class.Interestingly, the
binarySearchmethod is written such that it looks at the type of theListand sees whether it implements theRandomAccessinterface before it actually does the binary search. If the list doesn’t implementRandomAccess, the instead of using a standard binary search, the method uses a modified binary search with iterators that is guaranteed to make at most O(n) iterations and O(log n) comparisons. The idea is to keep track of where the last probe landed, then to walk forward or backward the appropriate number of steps to find the next probe location, etc. The total work done is then at most n/2 + n/4 + n/8 + n/16 + … = 2n, so in the worst-case it’s only twice as bad as a worst-case linear search.In short, providing a generic implementation of
binarySearchdoesn’t always make it possible to quickly search a List for something, but for the structures that do support fast access it can make a huge difference and save a lot of implementer time. Additionally, having the graceful degradation to the modified binary search that runs in O(n) time means that the implementation isn’t ever going to be too much worse than a standard linear scan.This reasoning is similar to the reasoning behind the design of the C++ algorithms, which operate on generic ranges of values. The efficiency of these algorithms might be much worse than a specialized version of the algorithm on a per-data-structure basis, but having the general version available means that any new containers that support iterators can automatically have a lot of functionality available beyond just what’s specified in the interface.
Hope this helps!