I have an text that consists of information enclosed by a certain pattern. The only thing I know is the pattern: ‘${template.start}’ and ${template.end} To keep it simple I will substitute ${template.start} and ${template.end} with ‘a’ in the example.
So one entry in the text would be:
aINFORMATIONHEREa
I do not know how many of these entries are concatenated in the text. So the following is correct too:
aFOOOOOOaaASDADaaASDSDADa
I want to write a regular expression to extract the information enclosed by the ‘a’s.
My first attempt was to do:
a(.*)a
which works as long as there is only one entry in the text. As soon as there are more than one entries it failes, because of the .* matching everything. So using a(.*)a on aFOOOOOOaaASDADaaASDSDADa results in only one capturing group containing everything between the first and the last character of the text which are ‘a’:
FOOOOOOaaASDADaaASDSDAD
What I want to get is something like
captureGroup(0): aFOOOOOOaaASDADaaASDSDADa captureGroup(1): FOOOOOO captureGroup(2): ASDAD captureGroup(3): ASDSDAD
It would be great to being able to extract each entry out of the text and from each entry the information that is enclosed between the ‘a’s. By the way I am using the QRegExp class of Qt4.
Any hints? Thanks! Markus
Multiple variation of this question have been seen before. Various related discussions:
- Regex to replace all \n in a String, but no those inside [code] [/code] tag
- Using regular expressions how do I find a pattern surrounded by two other patterns without including the surrounding strings?
- Use RegExp to match a parenthetical number then increment it
- Regex for splitting a string using space when not surrounded by single or double quotes
- What regex will match text excluding what lies within HTML tags?
and probably others…
Simply use non-greedy expressions, namely: