I need some help to model this regular expression. I think it’ll be easier with an example. I need a regular expression that matches a comma, but only if it’s not inside this structure: "( )", like this:
,a,b,c,d,"("x","y",z)",e,f,g,
Then the first five and the last four commas should match the expression, the two between xyz and inside the ( ) section shouldn’t.
I tried a lot of combinations but regular expressions is still a little foggy for me.
I want it to use with the split method in Java. The example is short, but it can be much more longer and have more than one section between “( and )”. The split method receives an expression and if some text (in this case the comma) matches the expression it will be the separator.
So, want to do something like this:
String keys[] = row.split(expr);
System.out.println(keys[0]); // print a
System.out.println(keys[1]); // print b
System.out.println(keys[2]); // print c
System.out.println(keys[3]); // print d
System.out.println(keys[4]); // print "("x","y",z)"
System.out.println(keys[5]); // print e
System.out.println(keys[6]); // print f
System.out.println(keys[7]); // print g
Thanks!
You can do this with a negative lookahead. Here’s a slightly simplified problem to illustrate the idea:
Note that instead of
,, the delimiter is now;, and instead of"(and"), the parentheses are simply<and>, but the idea still works.On the pattern
The
[…]is a character class. Something like[aeiou]matches one of any of the lowercase vowels.[^…]is a negated character class.[^aeiou]matches one of anything but the lowercase vowels.The
*repetition specifier can be used to match “zero-or-more times” of the preceding pattern.The
(?!…)is a negative lookahead; it can be used to assert that a certain pattern DOES NOT match, looking ahead (i.e. to the right) of the current position.The pattern
[^<>]*>matches a sequence (possibly empty) of everything except parentheses, finally followed by a paranthesis which is of the closing type.Putting all of the above together, we get
;(?![^<>]*>), which matches a;, but only if we can’t see the closing parenthesis as the first parenthesis to its right, because witnessing such phenomenon would only mean that the;is “inside” the parentheses.This technique, with some modifications, can be adapted to the original problem. Remember to escape regex metacharacters
(and)as necessary, and of course"as well as\in a Java string literal must be escaped by preceding with a\.You can also make the
*possessive to try to improve performance, i.e.;(?![^<>]*+>).References