Given the string:
© 2010 Women’s Flat Track Derby Association (WFTDA)
I want:
2010 -- Women's -- Flat
Women's -- Flat -- Track
Track -- Derby -- Association
I’m using regex:
([a-zA-Z]+)\s([A-Z][a-z]*)\s([a-zA-Z]+)
It’s only returning:
s -- Flat -- Track
This problem isn’t straightforward, but to understand why, you need to understand how the regular expression engine operates on your string.
Let’s consider the pattern
[a-z]{3}(match 3 successive characters between a and z) on the target stringabcdef. The engine starts from the left side of the string (before thea), and sees thatamatches[a-z], so it advances one position. Then, it sees thatbmatches[a-z]and advances again. Finally, it sees thatcmatches, advances again (to befored) and returnsabcas a match.If the engine is set up to return multiple matches, it will now try to match again, but it keeps its positional information (so, like above, it’ll match and return
def).Because the engine has already moved past the
bwhile matchingabc,bcdwill never be considered as a match. For this same reason, in your expression, once a group of words is matched, the engine will never consider words within the first match to be a part of the next one.In order to get around this, you need to use capturing groups inside of lookaheads to collect matching words that appear later in the string:
This results in:
See this in action at http://jsfiddle.net/jRgXm/.
The regular expression searches for what you seem to be defining as a word
([a-z0-9']+), and captures it into subgroup 1, and then uses a lookahead (which is a zero-width assertion, so it doesn’t advance the engine’s cursor), that captures the next two words into subgroups 2 and 3.However, if you are using the actual Javascript engine, you must
RegExp.execand loop over the results (see this question for a discussion of why) or use the newmatchAllmethod (ES2020). I don’t know how UltraEdit’s engine is implemented, but hopefully it can do a global search and also collect subgroups.Just for completeness, here’s the example above using ES2020′
matchAll(the first element in each returned array is the total match, then the subsequent elements are the capture groups):