I have an sql server 2008 db table that holds links to articles.
My app routine is that every ~1-10 seconds i get a list of 10-100 new articles that contain a url,
and what i need to do is check every article’s url and if it doesn’t exist on the db i will add it.
How i do it know :
first thing – i made a unique index for the url so no matter what – i won’t have the same url more than once (of course i normalize the url e.g cut it’s ‘http://www.’ prefix etc before i insert it).
the ‘InsertArticles’ method is something like this:
- Open a transaction
- for each link – check (using the transaction) if its url exists in the db
- for each unexisting link – add the link (of course,using the same transaction)
- execute and close the transaction + handle transaction/general exceptions
the thing is – most of the time it works very fast (0.05-0.2 secs) for about 10-20 or so links..
but sometimes it gets much slower – it can even take 50 secs to call this method with 50 articles.
So 2 questions here –
- is what I do ok ? should i use transactions for this kind of a job?
- what alternatives do i have ? maybe insert if not exists ?
i was also thinking – why not just ‘brute insert’ the new articles to the db – meaning, i will try to insert all the input url’s to the db and I will let sql server throw an exception for those urls that already exist there..
Maybe using a stored proceude to do all of this can enhance perfomance ?
anyway any help would be appreciated.
You could try the MERGE-statement to combine the SELECT and INSERT into one statement: