In my ASP.NET web app I’m trying to implement an import/export procedure to save or insert data in the application DB. My procedure generates some CSV files: one for each table.
Obviously there are relations between some of these tables and when I import CSV in my DB I’d like to maintain association between rows.
Say I have Table1 and Table2 with Table2 that has a foreign key to Table1. So I could have a row in Table1 with ID = 100 and a row in Table2 with Table1_ID = 100.
When I import CSV with Table1 data, new IDs are generated for Table1 rows, how can I maintain consistency of the foreign keys in Table2 when I import the corresponding CSV file?
I’m using Linq-to-SQL to retrieve data from DB… using DataSet and DataTable can help me?
NOTE I’d like to permit cumulative import, so when I import a CSV file there may already be data in the DB. So I cannot use ‘Set Identity OFF’.
Add the items of Table1 first, so when you add the items of Table2 there are the corresponding records of Table1 already in the database. For more tables you will have figure out the order. If you are creating a system of arbitrary database schema, you will want to create a table graph (where each node is a table and each arc is a foreign key) in memory [There are no types for that in the base library] and then convert it to a tree such that you get the correct order by traversing the tree (breadth-first).
You can let the database handle the cases where there is a violation of the foreign key, because there is not such field. You will have to decide if you make a transaction of the whole import operation, or per item.
Although analisying the CSVs before hand is possible. To do that, you will want to store the values for the primary key of each table [Use a set for that] (again, iterate over the tables in the correct order), and then when you are reading a table that has a foreign key to a table that you have already read you can check if the key is there, also it will help you yo detect any possible duplicate. [If you have things already in the database to take into account, you would have to query too… although, take care if the database is in an active system where records could be deleted while you are still deciding if you can add the CSVs without problem].
To address that you are generating new IDs when you add…
The simplest solution that I can think of is: don’t. In particular if it is an active system, where other requests are being processed, because then there is no way to predict the new IDs before hand. Your best bet would be to add them one by one, in that case, you will have to think your transaction strategy accordningly… it may be the case that you will not be able to roll back.
Although, I think your question is a bit deeper: If the ID of the Table1 did change, then how can I update the corresponding records in the Table2 so they point to the correct record in Table1?
To do that, I want to suggest to do the analysis as I described above, then you will have a group of sets that will works as indexes. This will help you locate the records that you need to update in Table2 for each ID in Table1. [It is also important to keep track if you have already updated a record, and don’t do it twice, because it may happen the generated ID match an ID that is yet to be sent to the database].
To roll back, you can also use those sets, as they will end up having the new IDs that identify the records that you will have to pull out of the database if you want to abort the operation.
Edit: those sets (I recommend hashset) are only have the story, because they only have the primary key (for intance: ID in Table1). You will need bags to keep the foreing keys (in this case Table1_ID in Table2).