What is the fastest way to load data from flatfiles into a MySQL database, and then create the relations between the tables via foreign keys?
For example… I have a flat file in the format:
[INDIVIDUAL] [POP] [MARKER] [GENOTYPE] 'INDIVIDUAL1', 'CEU', 'rs55555','AA' 'INDIVIDUAL1', 'CEU', 'rs535454','GA' 'INDIVIDUAL1', 'CEU', 'rs555566','AT' 'INDIVIDUAL1', 'CEU', 'rs12345','TT' ... 'INDIVIDUAL2', 'JPT', 'rs55555','AT'
Which I need to load into four tables:
IND (id,fk_pop,name) POP (id,population) MARKER (id,rsid) GENOTYPE (id,fk_ind,fk_rsid,call)
Specifically, how does one populate the foreign keys in a way that scales? The figures are in the range of 1000+ individuals, each with 1 million+ genotypes.
There is a simpler way.
First, make sure you have a UNIQUE constraint on those columns that should have one (name, population, rsid).
Then use something like the following:
Note where the @ is used to indicate variables, rather than column names. In the first 2 LOAD DATAs, these are just used to ignore data. In the second 2, they are used to look up the foreign keys.
Might not be very fast, mind :).