Given a long sequence of N (not necessary distinct) numbers, say
{1, 50, 3, 99, 1, 2, 100, 99, 4, 100, 4, 100} (could be very long)
and a small set of M ordered pairs, say
(1, 2)
(2, 1)
(1, 3)
(99, 50)
(99, 100)
I would like to detect whether the ordered pair occurs anywhere in the list (they could be separated, but order matters). For example, the counts above would be:
(1, 2): 2 (each 1 pairs with the later 2)
(2, 1): 0 (no 1's come after the 2)
(1, 3): 1 (only one of the 1's come before the 3)
(99, 50): 0 (no 99's come before the 50)
(99, 100): 5 (3 times for the first 99 and 2 times for the second)
Assuming that every number in the ordered pairs is guaranteed to appear in the list, does there exist an algorithm to extract these counts faster than the naive O(N * M) time (achieved by brute force searching for each ordered pair)?
As a side question, might there be a fast algorithm if we restrict ourselves to boolean occurrences only instead of counts? That is:
(1, 2): yes
(2, 1): no
(1, 3): yes
(99, 50): no
(99, 100): yes
Any help would be appreciated.
Keep two hashes, one mapping numbers to the least position at which they occur, and one mapping numbers to the greatest position at which they occur. The ordered pair (a, b) appears in order if least[a] < greatest[b] (and both hash keys are present). Preprocessing time is linear, space usage is linear, query time is constant (under standard assumptions about the complexity of hashing).
As for the counting version, the best I can think of is to keep one hash mapping each number to the positions at which it occurs in sorted order. To query a pair, “merge” the position lists, keeping track of the number of a-elements so far and the number of pair occurrences. When a b-element is selected to be next, increment the number of pairs by the number of a-elements. When an a-element is selected to be next, increment the number of a-elements. (If a == b, return length choose 2.)