I have a large data.frame with 12 columns and a lot of rows but lets simplify
Id A1 A2 B1 B2 Result
1 55 23 62 12 1
2 23 55 12 62 1 * (dup of Id 1)
3 23 6 2 62 1
4 23 55 62 12 1 * (dup of Id 1)
5 21 62 55 23 0 * (dup of Id 1)
6 . . .
. .
. .
. .
Now the ordering of the A’s (A1, A2) and B’s (B1, B2) does not matter. If they both have the same values eg (55,23) and (62,12) they are duplicates, no matter the ordering of A and B variables.
Furthermore if A_id_x = B_id_y and B_id_x = A_id_y and Result_id_x = 1 - Result_id_y we also have a duplicate.
How does one go about cleaning this frame of duplicates?
I ended up using Excel VBA programming to solve the problem
This was the procedure:
Internally sort each A and each B for all of the rows
Then flip the positions of A and B of Result = 0 and change Result to 1
Remove duplicates