I’m trying to find the (set) intersection between two columns in the same table in MySQL. I basically want to find the rows that have either a col1 element that is in the table’s col2, or a col2 element that is in the table’s col1.
Initially I tried:
SELECT * FROM table WHERE col1 IN (SELECT col2 FROM table)
which was syntactically valid, however the run-time is far too high. The number of rows in the table is ~300,000 and the two columns in question are not indexed. I assume the run time is either n^2 or n^3 depending on whether MySQL executes the subquery again for each element of the table or if it stores the result of the subquery temporarily.
Next I thought of taking the union of the two columns and removing distinct elements, because if an element shows up more than once in this union then it must have been present in both columns (assuming both columns contain only distinct elements).
Is there a more elegant (i.e. faster) way to find the set intersection between two columns of the same table?
Creating indexes on
col1andcol2would go a long way to help this query as well.