I’m trying to parse a file that looks like this:
|| Column Header A || Column Header B || Column Header C ||CRLF
| Data A | Data B | Data C |CRLF
| Data A | Data B | Data C |CRLF
(“CRLF” represents a line break)
I had code to parse this fine:
I first parse the file into an array of lines:
string[] lines = fileString.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Then, I parse each row to an array of column data values,
First, I parse to get the header using:
string Delimiter = "||";
string[] columns = line.Split(new string[] { Delimiter }, StringSplitOptions.RemoveEmptyEntries);
Then parse the rest of the rows using
string Delimiter = "|";
string[] columns = line.Split(new string[] { Delimiter }, StringSplitOptions.RemoveEmptyEntries);
This worked perfectly until I found a record that had a CRLF inside of a field so my parsing broke up.
Can anyone think of a good way to parse this data below, and handles CRLF correctly? Here is an example:
|| Column Header A || Column Header B || Column Header C ||CRLF
| Data A | Data B | Data C |CRLF
| Data A | Data B CRLF Continued B | Data C |CRLF
The issue is that when I do the initial parsing to get the array of lines, I get 4 lines here instead of 3 (because the last line shows up as two entries in that array.)
Not exactly elegant, but this brute-force solution is the first to come to mind. Split, and then combine if short: