I am attempting to create a MySQL snippet that will analyse a table and remove duplicate entries (duplicates are based on two fields not entire record)
I have the following code that works when I hard code the variables in the queries, but when I take them out and put them as variables I get MySQL errors, below is the script:
SET @tblname = 'mytable';
SET @fieldname = 'myfield';
SET @concat1 = 'checkfield1';
SET @concat2 = 'checkfield2';
ALTER TABLE @tblname ADD `tmpcheck` VARCHAR( 255 ) NOT NULL;
UPDATE @tblname SET `tmpcheck` = CONCAT(@concat1,'-',@concat2);
CREATE TEMPORARY TABLE `tmp_table` (
`tmpfield` VARCHAR( 100 ) NOT NULL
) ENGINE = MYISAM ;
INSERT INTO `tmp_table` (`tmpfield`) SELECT @fieldname FROM @tblname GROUP BY `tmpcheck` HAVING ( COUNT(`tmpcheck`) > 1 );
DELETE FROM @tblname WHERE @fieldname IN (SELECT `tmpfield` FROM `tmp_table`);
ALTER TABLE @tblname DROP `tmpcheck`;
I am getting the following error:
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '@tblname ADD `tmpcheck` VARCHAR( 255 ) NOT NULL' at line 1
Is this because I can’t use a variable for a table name? What else could be wrong or how would I get around this issue.
Yes, or for other schema names like columns. String variables can only be used where MySQL expects a
'-quoted string.If you really need to do this you can with ‘dynamic SQL’: creating your whole query as a string, concatenating the
@tblnameinto the string at that time, and executing the lot using EXECUTE. This is pretty ugly and can lead to SQL-injection if you’re not careful, so avoid it if there is any other option.This seems problematic to me. Unless
myfieldhas a functional dependency ontmpcheck(which AFAICS it can’t, astmpcheckis not a primary key), that’s not valid ANSI SQL. MySQL would let you get away with it, but what you would be saying is “for each group of rows sharing a value oftmpcheck, pick thefieldnamefrom one row out of that group at random for later deletion”. Is that really what you want? I would expect you to want to delete all but one of the duplicates.Normally you shouldn’t need this kind of complicated procedure to remove duplicates. Just use a DELETE-join:
This is assuming an
idfield that is orderable andUNIQUEso that you can decide which row gets to stay (here, the one with the highestid).myfieldmight be that field, but I can’t tell from context.