I have the following MySql table containing my raw event data (about 1.5 million rows)
userId | pathId | other stuff....
I have an index on userId, pathId (approx 50,000 unique combinations)
During my processing, I identify 30,000 userId, pathId values that I don’t want, but I do want to keep the original raw table. So I want to copy all rows into a processed event table, except the rows that match this 30,000 userId, pathId values.
An approach I’m considering is to write the 30,000 userId,PathId values of the rows I do not want into a temp_table, and then doing something like this:
[create table processed_table ...]
insert into processed_table
select * from raw_table r
where not exists (
select * from temp_table t where r.userId=t.userid and r.pathId=t.pathId
)
For info, processed_table generally ends up being half the size of raw_table.
Anyway, this seems to work but my SQL skills are limited, so my question (finally) is – is this the most efficient way to do this?
No, it’s not the most efficient. Source
Here’s an example with
NOT IN:And
LEFT JOIN ... IS NULL:However, since your table is very small and has only 50,000 rows, your original query is probably fast enough.