I was trying to solve this :
Given a sorted array that contains continuous integers starting from 0(one integer may be repeated many times) eg – 0,0,0,1,2,3,3,3,4,4(can be very long also – this is just an example) , efficiently find the starting and ending indices of a given integer.
I am thinking of using
1)traversal(complexity = O(n))
2) a modified binary search(complexity =O(log n)). [ n = length of total array]
Then was wondering if the continuous integers property could be utilized to solve it.
Any different ideas or suggestions ?
To begin, let’s ignore the “continuity” property
As long as the problem is about finding the most efficient way to handle a single individual request, the straightforward general solution would be to perform two consecutive binary searches: the first one finds the beginning of the sequence, the second one finds the end of the sequence. The second search is performed in the remainder of the array, i.e. to the right of the previously found beginning of the sequence.
However, if you somehow know that the average length of the sequence is relatively small, then it begins to make sense to replace the second binary search with a linear search. (This is the same principle that works when merging two sorted sequences of similar length: linear search outperforms binary search, because the structure of the input guarantees that on average the target of the search is located close to the beginning of the sequence).
More formally, if the length of the whole array is
nand the number of different integer values in the array (variety metric) isk, then linear search begins to outperform binary search on average whenn/kbecomes smaller thanlog2(n)(some implementation-dependent constant factors might be needed to come up with a practical relationship).The extreme example that illustrates this effect is the situation when
n=k, i.e. when all values in the array are different. Obviously, using the linear search to find the end of each sequence (once you know the beginning) will be vastly more efficient than using binary search.But that’s something that requires extra knowledge about the properties of the input array: we need to know
k.And this is when your “continuity” property comes into play!
Since the numbers are continuous, the last value in the array minus the first value in the array is equal to
k-1, meaning thatThis rule can also be applied to any sub-array of your original array to calculate the variety metric for that sub-array.
That already gives you a very viable and efficient algorithm for finding the sequence: first perform a binary search for the beginning of the sequence, and then perform either binary or linear search depending on the relationship between
nandk(or, even better, between the length of the right sub-array and the variety metric of the right sub-array).P.S. The same technique can be applied to the first search as well. If you are looking for sequence of
i, then you immediately know that it is thej-th sequence in the array, wherej = i - array[0]. That means that the linear search for the beginning of that sequence will takej * n/ksteps on average. If this value is smaller thanlog2(n), linear search might be a better idea than binary search.