I am looking for a fast algorithm:
I have a int array of size n, the goal is to find all patterns in the array that
x1, x2, x3 are different elements in the array, such that x1+x2 = x3
For example I know there’s a int array of size 3 is [1, 2, 3] then there’s only one possibility: 1+2 = 3 (consider 1+2 = 2+1)
I am thinking about implementing Pairs and Hashmaps to make the algorithm fast. (the fastest one I got now is still O(n^2))
Please share your idea for this problem, thank you
Edit: The answer below applies to a version of this problem in which you only want one triplet that adds up like that. When you want all of them, since there are potentially at least O(n^2) possible outputs (as pointed out by ex0du5), and even O(n^3) in pathological cases of repeated elements, you’re not going to beat the simple O(n^2) algorithm based on hashing (mapping from a value to the list of indices with that value).
This is basically the 3SUM problem. Without potentially unboundedly large elements, the best known algorithms are approximately
O(n^2), but we’ve only proved that it can’t be faster thanO(n lg n)for most models of computation.If the integer elements lie in the range
[u, v], you can do a slightly different version of this inO(n + (v-u) lg (v-u))with an FFT. I’m going to describe a process to transform this problem into that one, solve it there, and then figure out the answer to your problem based on this transformation.The problem that I know how to solve with FFT is to find a length-3 arithmetic sequence in an array: that is, a sequence
a,b,cwithc - b = b - a, or equivalently,a + c = 2b.Unfortunately, the last step of the transformation back isn’t as fast as I’d like, but I’ll talk about that when we get there.
Let’s call your original array
X, which contains integersx_1, ..., x_n. We want to find indicesi,j,ksuch thatx_i + x_j = x_k.Find the minimum
uand maximumvofXinO(n)time. Letu'bemin(u, u*2)andv'bemax(v, v*2).Construct a binary array (bitstring)
Zof lengthv' - u' + 1;Z[i]will be true if eitherXor its double[x_1*2, ..., x_n*2]containsu' + i. This isO(n)to initialize; just walk over each element ofXand set the two corresponding elements ofZ.As we’re building this array, we can save the indices of any duplicates we find into an auxiliary list
Y. OnceZis complete, we just check for2 * x_ifor eachx_iinY. If any are present, we’re done; otherwise the duplicates are irrelevant, and we can forget aboutY. (The only situation slightly more complicated is if0is repeated; then we need three distinct copies of it to get a solution.)Now, a solution to your problem, i.e.
x_i + x_j = x_k, will appear inZas three evenly-spaced ones, since some simple algebraic manipulations give us2*x_j - x_k = x_k - 2*x_i. Note that the elements on the ends are our special doubled entries (from2X) and the one in the middle is a regular entry (fromX).Consider
Zas a representation of a polynomialp, where the coefficient for the term of degreeiisZ[i]. IfXis[1, 2, 3, 5], thenZis1111110001(because we have 1, 2, 3, 4, 5, 6, and 10);pis then 1 + x + x2 + x3 + x4 + x5 + x9.Now, remember from high school algebra that the coefficient of xc in the product of two polynomials is the sum over all a, b with a + b = c of the first polynomial’s coefficient for xa times the second’s coefficient for xb. So, if we consider q = p2, the coefficient of x2j (for a j with
Z[j] = 1) will be the sum over all i ofZ[i] * Z[2*j - i]. But sinceZis binary, that’s exactly the number of triplets i,j,k which are evenly-spaced ones inZ. Note that (j, j, j) is always such a triplet, so we only care about ones with values > 1.We can then use a Fast Fourier Transform to find p2 in
O(|Z| log |Z|)time, where|Z|isv' - u' + 1. We get out another array of coefficients; call itW.Loop over each
x_kinX. (Recall that our desired evenly-spaced ones are all centered on an element ofX, not2*X.) If the correspondingWfor twice this element, i.e.W[2*(x_k - u')], is 1, we know it’s not the center of any nontrivial progressions and we can skip it. (As argued before, it should only be a positive integer.)Otherwise, it might be the center of a progression that we want (so we need to find
iandj). But, unfortunately, it might also be the center of a progression that doesn’t have our desired form. So we need to check. Loop over the other elementsx_iofX, and check if there’s a triple with2*x_i,x_k,2*x_jfor somej(by checkingZ[2*(x_k - x_j) - u']). If so, we have an answer; if we make it through all ofXwithout a hit, then the FFT found only spurious answers, and we have to check another element ofW.This last step is therefore O(n * 1 + (number of x_k with W[2*(x_k – u’)] > 1 that aren’t actually solutions)), which is maybe possibly
O(n^2), which is obviously not okay. There should be a way to avoid generating these spurious answers in the outputW; if we knew that any appropriateWcoefficient definitely had an answer, this last step would beO(n)and all would be well.I think it’s possible to use a somewhat different polynomial to do this, but I haven’t gotten it to actually work. I’ll think about it some more….
Partially based on this answer.