I have a database, that isn’t really fast, and I have a big CSV of about 65000 rows. I need to crosscheck these for existence and to update the database when needed.
- In the CSV, there is a column that contains the database IDs. It is always a 1:1 relationship.
- The CSV may hold new input for the database, so it can happen that there are no DB entries for it.
- I cannot loop through the CSV and check each row, because it is too slow.
- Getting all results from the database at first and storing them to loop through every time won’t work, because that will pull alot of RAM.
How can I do the following:
- Check if a row in the CSV has a database entry. If so, write it away to another CSV file.
- If the row has no database entry, write it to a different file.
- Keep the timespan within 5 minutes, preferably shorter.
The CSV has alot of columns (for example 70), but I only need column 5 for crosschecking the IDs. I have tried to first loop through the CSV file and then check it with the database, but that is too slow. It can take over 10 minutes. I have also tried to get all entries from the database, and loop through those. Withing the loop, run through the CSV (using a BufferedStream), and checking it. This does decrease the time significantly (5 min max.), but will not be able to record the entries that do not exist in the database.
Is there any way I can do this while keeping the speed up?
Late answer, but I have fixed it this way: I am pulling the CSV columns that I need into a
DataTable. Then I fetch all rows that I need to check (it has a certain number I can filter on), and run through those database rows. Each row will check for the corresponding ID in theDataTableand put the data in a new CSV. After that, the row in the DataTable will be deleted. In the end I have a CSV with rows that do exist and will be imported into the system, and a DataTable that will be exported to a CSV with rows that need to be added.Thanks for Gregory for helping me getting on the right track.