I need ideas for a problem I am working on:
I am writing a data synchronizer in C#.Net that will receive CSV files, one for each table in a SQL Server database.
Some of the rows in the csv files will reference existing rows in the database, requiring an update, and some will reference new rows, requiring an insert.
Since there might be a lot of files (20 or so) and potentially a lot of rows in each, how can I make this scalable? Reading one row at the time, connecting to the database to make sure if a row with that same ID exists or not (to make sure if it is an update or insert) and then making another connection for doing the actual update or insert seems wasteful.
Load everything in a temporary table (bulk insert)
Perform a merge update to the target table.