I am trying to parse a file using regex split, it works well with the '\t' character but some lines have the '\t' inside a field instead of acting as the delimiter.
Like :
G2226 TEST 1 C 29 Internal Head Office D Head Office ZZZ Unassigned 10910 10/10/2011 11/10/2011 10/10/2011 11/10/2011 "Test call Sort the customer out some data. See the customer again tomorrow to talk about Prod " Mr ABC Mr ABC Mr ABC Mr ABC Credit Requested BDM Call Internal Note 10
This part has 2 tabs I wish were ignored :
"Test call Sort the customer out some data. See the customer again tomorrow to talk about Prod\t\t"
The good thing is, they are included in double quotes, but I cannot work out how to ignore them, any ideas?
Edit:
My goal is to get 36 columns, some columns may come out more after a Regex.Split(lineString,'\t') using '\t' because they include '\t' characters inside some of the fields. I would like to ignore those ones. The one above comes out to 38 cols, which is rejected by my datatable as the header is only 36 cols, I would like to solve this problem.
If you have a simple CSV file, then regex split is usually the easiest way to process it.
However, if your CSV file contains more complex elements, such as quoted fields that contain separator characters or newlines, then this approach will no longer work. It is not a trivial matter to correctly parse these types of files, so you should use a library when possible.
The answers to this question give several options for C# libraries that can read a CSV file.