I’m doing some web scraping to build up a personal SQL database. As I’m looping through the web requests, I’m adding records. The only thing is, duplicates sometimes appear in the web requests and I want to make sure to only add a record if it doesn’t already exist in my database. I gather this can be done by performing an SQL query before every insert to make sure that record hasn’t already been added, but is this the best way to do it? Would it make more sense to build up a Generic.List first, and then do all my database inserts at the end?
Share
You can create a stored procedure that will attempt to update a record and then insert if the update query did not update any rows. This will minimize the number of queries that need to be run and prevent checking for the row’s existence. A little bit of Googling found this. The second option looks like it might be what you are looking for.