I am collecting articles from numerous RSS feeds into a MySQL database (just the title and link from the actual feed), and I would like to make sure I do not enter the same article twice when rechecking the feeds. I anticipate storing up to 200,000 entries in the table.
Which would be the best way to check for duplicates:
- Make the URL a unique field in the DB,
- Create a new unique identifier for every article (like SHA1 the URL and/or title),
- something else?
Edit: Thanks everyone for confirming #1.
UNIQUEkeys are designed for this.If you want to bulk insert but may have duplicate errors, use
INSERT IGNORE