Consider the following table “tweets”
tweet_id call_id id_str timestamp text
--------------------------------------------------
1 11 12345 312323134 lorem ipsum
2 11 12345 312323134 lorem ipsum
3 11 12345 312323134 lorem ipsum
4 11 12345 312323134 lorem ipsum
5 11 67890 325565454 dolor
6 11 34355 333544664 samet
Each tweet should only appear once in the database. As you can see, the exact same tweet (same call_id, id_str, timestamp and text – on other words everything is the same except for the tweet_id (which is an autonumbering field)) has been stored 4 times (!)
Is there a way to exact matches (so: same everything except tweet_id) and then delete the last x – 1 (here: 4 -1 = 3) of them? In other words, the cleaned-up table would look like
tweet_id call_id id_str timestamp text
--------------------------------------------------
1 11 12345 312323134 lorem ipsum
5 11 67890 325565454 dolor
6 11 34355 333544664 samet
I hope there’s an easy way to do this, because otherwise I have a huge problem (shows you what putting in an extra hour of thinking before you actually build your database can do!)
have you searched for a solution online before asking this question ???? if not here is an online tutorial on how to do this.
http://www.sqlteam.com/article/deleting-duplicate-records