Here is my setup:
Table records contains multiple (more than two) PKID columns along with some other columns.
Table cached_records only has two columns, which are the same as two of the PKIDs for records.
For instance, let’s assume records has PKIDs ‘keyA’, ‘keyB’, and ‘keyC’ and cached_records only has ‘keyA’ and ‘keyB’.
I need to pull the rows from the records table where the appropriate PKIDs (so, ‘keyA’ and ‘keyB’) are not in the cached_records table.
IF I was working with only ONE PKID, I know how simple this task would be:
SELECT
pkid
FROM
records
WHERE
pkid NOT IN (SELECT pkid FROM cached_records)
However, the fact that there is two PKIDs means I can’t use a simple NOT IN. This is what I currently have:
SELECT
`keys`.`keyA` AS `keyA`,
`keys`.`keyB` AS `keyB`
FROM
(
SELECT DISTINCT
`keyA`,
`keyB`
FROM
`records`
) AS `keys`
LEFT JOIN
`cached_records` AS `cached`
ON
`keys`.`keyA` = `cached`.`keyA`
AND
`keys`.`keyB` = `cached`.`keyB`
WHERE
(
`cached`.`keyA` IS NULL
AND
`cached`.`keyB` IS NULL
)
(The DISTINCT is needed because since I am only grabbing two of the multiple PKIDs from the records table, there could be duplicates and I really don’t need duplicates; ‘keyC’ is not being used and it helps determine uniqueness of the records).
This query above works just fine, however, as the cached_records table grows, the query takes longer and longer to process (we’re talking minutes now, sometimes takes long enough that my code hangs and crashes).
So, I’m wondering what the most efficient way is to do this kind of operation (selecting rows from one table where the rows don’t exist in another) with multiple PKIDS instead of just one…
This should be quicker:
Notes:
keyAandkeyBcolumns are of the same type, and no conversion occurs (seen such in working live code…)