File structure is as such:
"group","type","scope","name","attribute","value"
"c","","Probes Count","Counter","value","35"
"b","ProbeInformation","Probes Count","Gauge","value","0"
Always using quotes. There is a trailing newline as well.
Here is what I have:
^(\"[^,\"]*\")(,(\"[^,\"]*\"))*(.(\"[^,\"]*\")(,(\"[^,\"]*\")))*.$
That is not matching correctly. I’m using String.matches(regexp);
Disclaimer: I didn’t even try compiling my code, but this pattern has worked before.
When I can’t see at a glance what a regex does, I break it out into lines so it’s easier to figure out what’s going on. Mismatched parens are more obvious and you can even add comments to it. Also, let’s add the Java code around it so escaping oddities become clear.
becomes
Much better. Now to business: the first thing I see is your regex for the quoted values. It doesn’t allow for commas within the strings – which probably isn’t what you want – so let’s fix that. Let’s also put it in its own variable so we don’t mis-type it at some point. Lastly, let’s add comments so we can verify what the regex is doing.
Things are getting even clearer. I see two big things here:
1) (I think) you’re trying to match the newline in your input string. I’ll play along, but it’s cleaner and easier to split the input on a newline than what you’re doing (that’s an exercise you can do yourself though). You also need to be mindful of the different newline conventions that different operating systems have (read this).
2) You’re capturing too much. You want to use non-capturing groups or parsing your output is going to be difficult and error-prone (read this).
From here, I see you duplicating work again. Let’s fix that. This also fixes a missing * in your original regex. See if you can find it.
That’s a little easier to read, no? Now you can test your big nasty regex in pieces if it doesn’t work.
You can now compile the regex, get the matcher, and grab the groups from it. You still have a few issues though:
1) I said earlier that it would be easier to break on newlines. One reason is: how do you determine how many values do you have per line? Hard-coding it will work, but it’ll break as soon as your input changes. Maybe this isn’t a problem for you, but it’s still bad practice. Another reason: the regex is still too complex for my liking. You could really get away with stopping at LINE.
2) CSV files allow lines like this:
To handle this you might want to add another mini-regex that gets either a quoted value or a list of digits.