I need to perform structural comparison on two Object[] arrays which may contain themselves:
Object[] o1 = new Object[] { "A", null };
o1[1] = o1;
Object[] o2 = new Object[] { "A", null };
o2[1] = o2;
Arrays.deepEquals(o1, o2); // undefined behavior
Unfortunately, the deepEquals doesn’t work in this case. The example above should yield true.
Is there an algorithm which can reliably calculate this?
My idea is roughly as follows:
List<Object> xs = new ArrayList<>();
List<Object> ys = new ArrayList<>();
boolean equal(Object[] o1, Object[] o2, List<Object> xs, List<Object> ys) {
xs.add(o1);
ys.add(o2);
boolean result = true;
for (int i = 0; i < o1.length; i++) {
if (o1[i] instanceof Object[]) {
int idx1 = xs.lastIndexOf(o1[i]);
if (idx1 >= 0) { idx1 = xs.size() - idx1 - 1; }
if (o2[i] instanceof Object[]) {
int idx2 = xs.lastIndexOf(o2[i]);
if (idx2 >= 0) { idx2 = ys.size() - idx2 - 1; }
if (idx1 == idx2) {
if (idx1 >= 0) {
continue;
}
if (!equal(o1[i], o2[i], xs, ys)) {
result = false;
break;
}
}
}
}
}
xs.removeLast();
ys.removeLast();
return result;
}
As I mentioned in my comments above, your code has some compile errors, and you’ve left out a lot of it, which makes it hard to be 100% sure of exactly how it’s supposed to work once the code is completed. But after finishing the code, fixing one clear typo (you wrote
idx2 = xs.lastIndexOf(o2[i]), but I’m sure you meantidx2 = ys.lastIndexOf(o2[i])) and one thing that I think is a typo (I don’t think that you meant forif (!equal(o1[i], o2[i], xs, ys))to be nested insideif (idx1 == idx2)), removing some no-op code, and restructuring a bit (to a style that I find clearer; YMMV), I get this:which mostly works. The logic is, whenever it gets two
Object[]s, it checks to see if if it’s currently comparing each of them higher up in the stack and, if so, it checks to see if the topmost stack-frame that’s comparing one of them is also the topmost stack-frame that’s comparing the other. (That is the logic you intended, right?)The only serious bug I can see is in this sort of situation:
You see, in the above, the last element of
xswill always bea, but the last element ofyswill alternate betweenbandb[0]. In each recursive call,xs.lastIndexOf(a)will always be the greatest index ofxs, whileys.lastIndexOf(b)orys.lastIndexOf(b[0])(whichever one is needed) will always be one less than the greatest index ofys.The problem is, the logic shouldn’t be, “the topmost comparison of
o1[i]is in the same stack-frame as the topmost comparison ofo2[i]“; rather, it should be, “there exists some stack-frame — any stack-frame at all — that is comparingo1[i]too2[i]“. But for efficiency, we can actually use the logic “there is, or has ever been, a stack-frame that is/was comparingo1[i]too2[i]“; and we can use aSetof pairs of arrays rather than twoLists of arrays. To that end, I wrote this:It should be clear that the above cannot result in infinite recursion, because if the program has a finite number of arrays, then it has a finite number of pairs of arrays, and only one stack-frame at a time can be comparing a given pair of arrays (since, once a pair begins to be getting compared, it’s added to
pairs, and any future attempt to compare that pair will immediately returntrue), which means that the total stack depth is finite at any given time. (Of course, if the number of arrays is huge, then the above can still overflow the stack; the recursion is bounded, but so is the maximum stack size. I’d recommend, actually, that thefor-loop be split into twofor-loops, one after the other: the first time, skip all the elements that are arrays, and the second time, skip all the elements that aren’t. This can avoid expensive comparisons in many cases.)It should also be clear that the above will never return
falsewhen it should returntrue, since it only returnsfalsewhen it finds an actual difference.Lastly, I think it should be clear that the above will never return
truewhen it should returnfalse, since for every pair of objects, one full loop is always made over all the elements. This part is trickier to prove, but in essence, we’ve defined structural equality in such a way that two arrays are only structurally unequal if we can find some difference between them; and the above code does eventually examine every element of every array it encounters, so if there were a findable difference, it would find it.Notes:
int[]anddouble[]and so on. Adam’s answer raises the possibility that you would want these to be compared elementwise as well; if that’s needed, it’s easily added (since it wouldn’t require recursion: arrays of primitives can’t contain arrays), but the above code just usesObject.equals(Object)for them, which means reference-equality.Object.equals(Object)implements a symmetric relation, as its contract specifies. In reality, however, that contract is not always fulfilled; for example,new java.util.Date(0L).equals(new java.sql.Timestamp(0L))istrue, whilenew java.sql.Timestamp(0L).equals(new java.util.Date(0L))isfalse. If order matters for your purposes — if you wantequal(new Object[]{java.util.Date(0L)}, new Object[]{java.sql.Timestamp(0L)})to betrueandequal(new Object[]{java.sql.Timestamp(0L)}, new Object[]{java.util.Date(0L)})to befalse— then you’ll want to changeArrayPair.equals(Object), and probablyArrayPair.hashCode()as well, to care about which array is which.