The problem is easy to explain: we have two big arrays (32 bit integer values) and we have to find all common sequences above a given number of consecutive position (n).
For instance, if n=3 and arrays to compare are:
a = [1, 3, 5, 7, 3, 2, 7, 4, 6, 7, 2, 1, 0, 4, 6]
b = [2, 5, 7, 3, 2, 3, 4, 5, 6, 3, 2, 7, 4, 6, 0]
The algoritmh should return, two arrays:
r0 = [5, 7, 3, 2]
r1 = [3, 2, 7, 4, 6]
(or better, its relative positions to first array and the number of consecutive bytes matched).
I believe a good point to start is the Longest Common Substring Algorithm, but perhaps anybody knows an algorithm that fits better or exactly with my problem.
I think the algorithm for finding LCS using suffix tree is a perfect fit. You build the suffix tree the same way, but in the final phase, you’re not looking for the deepest node that has descendants for both strings. You’re looking for all nodes with the depth of more than
nthat have descendants for both strings.