I am making a script to automatically parse some text data (with a complex structure) and insert it into a MySQL database.
I would like to have multiple for loops that iterate over a list of files based on regex matches to the file names. In the end I will concatenate them and insert them into the database.
Here are my regex expressions:
Trgx= re.compile('([a-zA-Z0-9]{3,4})_.*_.*_.*$');
Dtrgx= re.compile('[a-zA-Z0-9]{3,4}_[a-zA-Z0-9]{3,4}_([0-9]{10})_[0-9]{3}');
Mrgx= re.compile('.*_([a-zA-Z0-9]{3,4})_.*$');
Hrgx= re.compile('.*([0-9]{3}).csv$');
My filenames look like this:
ecd_cdd_2012102100_000.csv
ecd_cdd_2012102100_024.csv
ecd_hdd_2012102200_000.csv
ecd_hdd_2012102200_024.csv
ecd_hdd_2012102200_048.csv
ecd_avgt_2012102200_120.csv
ecd_avgt_2012102200_144.csv
ecd_avgt_2012102200_168.csv
ecd_mint_2012102200_192.csv
ecd_maxt_2012102200_144.csv
ecd_maxt_2012102200_168.csv
ecd_cdd_2012102200_000.csv
ecd_cdd_2012102200_024.csv
Each expression captures a subset of the file name:
- Trgx captures the first part (“ecd” in every instance in this example)
- Mrgx captures the second part (“cdd”,”hdd”,”avgt”,etc)
- Dtrgx captures the date/time segment (like 2012102100)
- Hrgx captures the last part (like 000 or 024, before the extension)
Every file name will match every regular expression, but .group(1) will be populated
by different values.
I want to iterate through the files using the regex exprssions as “grouping” elements, so that I concatenate them together in the right order.
Something like this:
for fileName in fileNameList
for each distinct value in Trgx.group(1)
for each distinct value in Dtrgx.group(1)
for each distinct value in Hrgx.group(1)
do whatever
It may be easier to combine the regexes together
and save yourself the trouble of maintaining four regexes.
Then you can just do