i’m working on a webpage indexing robot and using php and mysql. one of the tables will be to store the list of webpages. i want that table to have an integer based id column (so that i can join it to other tables more quickly, and so that other tables can reference it without storing the complete url in a column), but i also want the url column to remain unique.
the reason this question has come up is that if my script reads a web page that has 250 links i’m not sure the best way to add them to the webpages table without creating duplicates. i could loop the 250 links that i found and query the table for each of them to make sure they don’t exist, and then insert the ones that do – which would require at least 250 queries.
however, if i could have mysql enforce the url to be a unique column i could simply do one insert ignore, statement with all of the links. this would keep the existing records from being duplicated while adding new ones. if i had this type of table with an integer id column that is unique / primary and auto-incremented, could i also specify that the url column be unique?
id (auto inc) url added
----------------------------------------------------------------------
1 http://site.com/page-a/ 2010-01-01 01:23:34
2 http://site.com/page-b/ 2010-01-01 01:23:34
3 http://site.com/page-c/ 2010-01-01 01:23:34
another option that has crossed my mind is setting the url to be a unique primary key, and then manually incrementing the id column with php / mysql statements:
url (unique, primary) id added
-------------------------------------------------------------
http://site.com/page-a/ 1 2010-01-01 01:23:34
http://site.com/page-b/ 2 2010-01-01 01:23:34
http://site.com/page-c/ 3 2010-01-01 01:23:34
it seems like that would have the benefit of allowing mysql to enforce a unique url column (and thus allowing my insert ignore method), and also allowing me to join on an integer. the only weird thing is having to rely on my script to add and increment the id column, but i may be willing to do that if there is not a better option.
what are my options?
Yes, you can make the
urlcolumn unique. So a bigINSERT IGNORE INTO ... (...) VALUES (...), (...), ...will do the job.Do NOT make
urlprimary though! Your primary key should be theidauto_increment field; simply create a unique index on theurlfield.Don’t forget that
INSERT IGNOREis MySQL-specific though so if you ever plan to support e.g PostgreSQL, too, your code won’t work.