I am trying to use FileHelper library for parsing a text file. Ultimately the data would be stored in database. My text file contains positional records. The first two chars of the record define there position in the hierarchy of the records.
The file is ordered in the following manner:
- 10 Common Data (10-19 all have the same level)
- 20 2nd level Data (20-29 have the same level)
- 30 3rd level data (30-39 have the same level)
- 40 4th level Data
- 50 5th level data
- 60 Last level Data
- 60 Last level Data
- 50 5th level data
- 60 Last level Data
- 60 Last level Data
- 50 5th level data
- 40 4th level Data
- 50 5th level data
- 60 Last level Data
- 60 Last level Data
- 50 5th level data
- 60 Last level Data
- 60 Last level Data
- 50 5th level data
- 40 4th level Data
- 30 3rd level data
- repeated sequence of 40, 50, 60..
- 30 3rd level data (30-39 have the same level)
- 20 2nd level Data
- repeated sequence of 20, 40, 50, 60.. and so on…
- 20 2nd level Data (20-29 have the same level)
Now i am trying to use the Master-Detail concept of FileHelper but i guess it only works for one level of Master-Detail. Can it be used to create a hierarchy of data which then can be used to fill the relevant tables? All the records are Fixed length records so no problem there.
Caution: there is no primary-foreign key relation between the records. The position and the record number tells who is the parent(master) and who are there children(details).
Sample date is given below:
10R 420120320F 20120320212045 16
11F FFuture
11C OCall
11P OPut
12CADCanadian Dollars 0
12CHFSwiss Francs 0
12CZKCzech Republic Korun 0
12DEMGerman Marks 0
12DKKDanish Krone 0
12ESBSpanish Pesatas 3
12EUREuropean currency Un 0
12FIMFinnish Mark 0
14 1 20.0000 100 2O UKX 1A 1L Z 1B 1
14 2 20.0000 100 2L EFE 1A 1O EFE 1B 1
14 3 20.0000 100 2L EFP 1A 1O EFP 1B 1
14 4 20.0000 100 2L CCI 1A 1O CCI 1B 1
14 5 20.0000 100 2L AXI 1A 1O AXI 1B 1
14 6 20.0000 100 2L BLI 1A 1O BLI 1B 1
15 1F+0, VOL+ 2
15 2F+0, VOL- 1
15 3F+1/3, VOL+ 4
15 4F+1/3, VOL- 3
15 5F-1/3, VOL+ 6
15 16F-EXTREME 16
16EQYLIFFE Equities
16IPEIntl. Petroleum Exchange
16LCPLIFFE Commodity Products
16LIFLIFFE Financials
16LIGLIFFE OTC
16LMELME Metals
20L LIFFE F
30AXIAEX Index EQYEUR2.000.3500 10 110 1
31 1 10000000099999999
32 1 220 2 1 1A 1 1B
34 1 1 1 1
40ZAXFAEX Index Future EUR 10000 10 0.02000 1.00 0 0 2000002
50201204000.0000000.25000.2500 120120400
60 0F 1 3308420 1.0000 0 0 -66667 -66667 66667 66667-133333-133333 133333 133333-200000-200000 200000 200000-140000 140000
50201205000.0000000.25000.2500 120120500
60 0F 1 3262910 1.0000 0 0 -66667 -66667 66667 66667-133333-133333 133333 133333-200000-200000 200000 200000-140000 140000
50201206000.0000000.25000.2500 120120600
60 0F 1 3258970 1.0000 0 0 -66667 -66667 66667 66667-133333-133333 133333 133333-200000-200000 200000 200000-140000 140000
40I OTHREE MONTH EURO (EUEUR 10000 25 0.25000 1.00 3 1000 32002
50201204000.0000000.35000.3500 120120600
60 97750C 1 16000 1.0000 0 0 -1067 -1067 1067 1067 -2133 -2133 2133 2133 -3200 -3200 3200 3200 -2240 2240
60 97750P 1 0 0.0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60 97875C 1 14750 1.0000 0 0 -1067 -1067 1067 1067 -2133 -2133 2133 2133 -3200 -3200 3200 3200 -2240 2240
60 97875P 1 0 0.0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30L 3 Month Pound Sterli LIFGBP2.000.3500 11010 1
31 6 10000000020120600 22012090020121200 32013030020131200 42014030020141200 52015030020151200 62016030099999999
32 1 50 2 2 1A 2 1B
32 2 55 2 1 1A 1 1B
32 21 290 2 6 1A 6 1B
34 2 1 1 3 2 4 6
Please anybody can guide me in how to use FileHelper or any other library or some algorithm to parse this. Using XML in this can be a problem as file size is huge (100 Mb’s) so I would prefer a non-xml based approach (my previous approach was XML based and was rejected by my architect).
Thanks in advance.
FileHelpers is not really designed for formats that complex. You might get somewhere with the MultiRecord engine if you define a separate format for each row and parse them all based on the start of line, but you will find it tricky to link child records with parent records.
I think your best approach would be to code it manually. Something like
If the file size is big, then you should not try to process the whole without saving the parsed parts somewhere (a database).
There are some interesting approaches for handling the parsing of the CSV grammar. You could use Linq which does not tend to give very helpful error messages when there is a parsing problem. Or you could use ExpandoObjects as described here. Another way would be to use a parser generator like Sprache. Regardless, these approaches are likely to run into memory problems if you try to handle the whole file. My advice would be to consider them for parsing the individual lines.