I need a regex that will parse a csv-style file, something like 57 fields wide, most fields enclosed in quotes (but maybe not all), separated by commas, with quoted fields having the possibility of embedded doubles ("") that represent single quotes in the evaluated string.
I’m a regex beginner/intermediate, and I think I can get pretty quickly to the basic expression to do the field parsing, but it’s the embedded double-quotes (and commas) I can’t get my head around.
Anyone? (Not that it matters but specific language is Matlab.)
If you really have to do it with a regex, I would do it in two passes; firstly separate the fields by splitting on the commas with something such as:
This should split on commas, only when there isn’t a preceding slash (I’m assuming this is what you mean by escaped commas). (I think in matlab you’ll end up with an array of indexes into the original strings)
Then you should check each matched field for escaped quotes, and replace them with something like:
Similarly for commas:
I’m not sure about the doubly escaped \’s in matlab having not had much experience with it.
As others have said, it’s probably better to use a csv library for handling the initial file.