I’ve got two tables with similar structure:
– First table: id and col1,col2,col3 – all numerics.
– Second table: id and col4,col5,col6 – all numerics.
I want to remove from the first one all rows which are similar to any of the rows from the second tagble. I consider a row to be similiar to other row when any column from the group col1-col3 is equal to any of the columns from the group col4-col6. Now I’m doing it in 9 consecutive data steps (first checks whether col1=col4, second col1=col5 , …, ninth col3=col6), which probably is not the optimal solution.
Any ideas how to improve this?
This is my solution:
If you run into trouble with the length of the CVARS macro variable you could use this instead:
The PROC SORT could be eliminated but it makes it more efficient for big data sets.
Or you could generate a format on the fly:
I suspect this last method would be faster than the two preceding solutions.
UPDATE
Using hash objects