Everything I have searched for and found has yet to work because I am accessing the Table through a php script and differently than everything I see. Anyways,
I am importing Feeds from a website into a mysql table. My table was created like this…
$query2 = <<<EOQ
CREATE TABLE IF NOT EXISTS `Entries` (
`feed_id` int(11) NOT NULL,
`item_title` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`item_link` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`item_date` varchar(40) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
EOQ;
$result = $db_obj->query($query2);
I enter the data like so….
foreach($rss->channel->item as $Item){
$query5 = <<<EOQ
INSERT INTO Entries (feed_id, item_title, item_link, item_date)
VALUES ('$get_id','$Item->title','$Item->link','$Item->pubDate')
EOQ;
$result = $db_obj->query($query5);
}
Now, every time Import new feeds from the site I want to make sure I delete any duplicates that might already be there. Everything I have tried, especially DISTINCT, has not worked for me. Does anyone know what type of query I could use to create a temp table, copy over any distinct rows (ENTIRE ROWS, if a title is the same but the date is different I want to keep that), drop the old table, then rename the tamp table to what I want…. or something similar?
Avoid using the duplicate rows in the first place. Make any unique values into keys. When adding new values to your database, use
The duplicates will be automatically overwritten. Replace is handy because it works like an insert when there is no conflict in the keys, but when there is then it will update the record and bump up any auto-incrementing keys.
EDIT
I’ve been drumming over this for a while. Here’s what I came up with.
The problem with making a multi-column key on (feed_id, item_title, item_link, item_date) is that it will exceed the 1000 byte limitation in MySQL for key length. So instead alter your schema like so:
Now when you store a new value, get a hash of the values together:
And for your insert statements use the following:
The hash will be a unique representation of the record in it’s entirety, and will be easy to compare in order to avoid duplicates. Now when you attempt to add the same record more than once, it will just replace the existing entry, and your query will not fail. As an alternative, you could continue to use insert, and the query will return an error, which you could handle however you want to.