I need to search for lines in a CSV file that end in an unterminated, double-quoted string.
For example:
1,2,a,b,"dog","rabbit
would match whereas
1,2,a,b,"dog","rabbit","cat bird"
1,2,a,b,"dog",rabbit
would not.
I have very limited experience with regular expressions, and the only thing I could think of is something like
"[^"]*$
However, that matches the last quote to the end of the line.
How would this be done?
Assuming quotes can’t be escaped, you need to test the parity of quotes (making sure that there’s an even number of them instead of odd). Regular expressions are great for that:
That will match all lines with an even number of quotes. You can invert the result for all strings with an odd number. Or you can just add another
([^"]*")part at the beginning:Similarly, if you have access to reluctant operators instead of greedy ones you can use a simpler-looking expression:
Now, if quotes can be escaped, it’s a different question entirely, but the approach would be similar: determine the parity of unescaped quotes.