I have inherited some code that uses regular expressions to parse CSV formatted data. It didn’t need to cope with empty string fields before now, however the requirements have changed so that empty string fields are a possibility.
I have changed the regular expression from this:
new Regex("((?<field>[^\",\\r\\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
to this
new Regex("((?<field>[^\",\\r\\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
(i.e. I have changed the + to *)
The problem is that I am now getting an extra empty field at the end, e.g. “ID,Name,Description” returns me four fields: “ID”, “Name”, “Description” and “”
Can anyone spot why?
This one:
I move the handling of “blank” fields to a third “or”. Now, the handling of
""already worked (and you didn’t need to modify it, it was the second(?<field>)block of your code), so what you need to handle are four cases:And this one should do it:
An empty field must be preceeded by the beginning of the row
^or by a,, must be of length zero (there isn’t anything in the(?<field>)capture) and must be followed by a,or by the end of the line$.