I’m trying to parse some data in a fixed format text file where each “record” is spread over a number of lines, as so …
MAILBOX: 10013 Created: 01/20/09 4:39 pm
MSGS: 0 UNPLAYED: 0 URGENT: 0 RECEIPT: 0
LCOS: RBC Standard : 20 FCOS: RBC Standard : 20
GCOS: Default GCOS 1 : 1 NCOS: Default : 1
TCOS: Default TCOS 1 : 1 RCOS: : 1
BAD LOGS: 0 LAST LOG: NEVER MINS: 0.0
PASSWD: Y TUTOR: N DAY: M NIGHT: M
NAME: CODE:
EXTEN: 10013 INDEX: 0
ATTEN DN: INDEX: 0
DISTRIBUTION LISTS WITH CHANGE RIGHTS:
all
DISTRIBUTION LISTS WITH REVIEW RIGHTS:
all
I’ve used File Helpers before for single line records, and it’s been very useful. Checking it’s documentation, it does have a MultiRecordEngine feature, but this is going to mean …
- a class for each line … not a problem
- calculating the exact size of each fixed format field … painful and open to error
- logic to check each line
And a further wrinkle I found was the fixed format is actually not fixed, i.e. there are different format lines depending on the target record, so some have 21 lines, some 22, 23, 24, etc.
I have found a Java flat file parsing library, FFP, but I’m a .NET, C#, PowerShell coder
Are there better ways of handling this sort of parsing ?
What you need is a lexer. Your record is too big to use a single Regex to parse, so you have to write one regex for each line, and a state machine to validate that the lines follows in the right order.
Or you can use a general purpose lexer/parser to generate the code for you. Wikipedia has long list. The Gold parser looks like a good candidate.
I would not try to do the lexing/parsing in PowerShell. I would rather write the code as C# or F# and use the assembly from PowerShell.
Edit: I’ve just looked at FileHelpers library. You could create a Multirecord Engine with a .NET Type that matches each line in you source record. All you have to do then is parse the result array for valid order and create objects.