We recently had some issues where a sql script that would remove duplicate entries from the table would not use the most recent entry as the one to keep. I think this line is the issue
delete from vaccine_patient_details
where vacc_pat_guid <>
(Select top 1 vacc_pat_guid
from vaccine_patient_details as v
where v.patient_guid = patient_guid and
v.vaccine_guid = vaccine_guid
order by date_given desc)
Is that correct syntax? I found another version of the script working on a different table. (names changed to match the first example)
delete from vaccine_patient_details
where vacc_pat_guid <>
(Select top 1 vacc_pat_guid
from vaccine_patient_details as v
where v.patient_guid = vaccine_patient_details.patient_guid and
v.vaccine_guid = vaccine_patient_details.vaccine_guid
order by date_given desc)
This one uses the the table name of the deleted table in the inner where clause, can that be causing a problem in my first version?
Details about the Table:
- Any columns that end in guid are a
datatype of uniqueidentifier - vacc_pat_guid is the primary key and is unique.
- date_given is a datetime that could be null. If there is a duplicate where one is null and one is not null it should prefer the not null one.
Whithout any aliases on the first table, the query is equivalent to :
And a good one would be
By specifiying the table name in the second query you show us, the optimizer understand that he have to join with the first table, because the seconde table is named ‘v’, and the first is so ‘vaccine_patient_details’, and he is not confuse.
He is confused in the first because he doesn’t know if patient_guid is the field in the first table or in the second one. So it takes the closer, so the second one.
Edit :
From http://dev.mysql.com/doc/refman/5.0/en/delete.html