I have files I need to convert into a database. These files (I have over 100k) are from an old system (generated from a COBOL script). I am now part of the team that migrate data from this system to the new system.
Now, because we have a lot of files to parse (each files is from 50mb to 100mb) I want to make sure I use the right methods in order to convert them to sql statement.
Most of the files have these following format:
#id<tab>name<tab>address1<tab>address2<tab>city<tab>state<tab>zip<tab>country<tab>#\n
the address2 is optional and can be empty
or
#id<tab>client<tab>taxid<tab>tagid<tab>address1<tab>address2<tab>city<tab>state<tab>zip<tab>country<tab>#\n
these are the 2 most common lines (I’ll say around 50%), other than these, all the line looks the same but with different information.
Now, my question is what should I do to open them to be as efficient as possible and parse them correctly?
Honestly, I wouldn’t use PHP for this. I’d use awk. With input that’s as predictably formatted as this, it’ll run faster, and you can output into SQL commands which you can also insert via a command line.
If you have other reasons why you need to use PHP, you probably want to investigate the fgetcsv() function. Output is an array which you can parse into your insert. One of the first user-provided examples takes CSV and inserts it into MySQL. And this function does let you specify your own delimiter, so tab will be fine.
If the id# in the first column is unique in your input data, then you should definitely insert this into a primary key in mysql, to save you from duplicating data if you have to restart your batch.