I have the following regular expression that works fine in perl:
Classification:\s([^\n]+?)(?:\sRange:\s([^\n]+?))*(?:\sStructural Integrity:\s([^\n]+))*\n
The type of data format this string is supposed to match against is:
Classification: Class Name Range: xxxx Structural Integrity: value
Classification: Class Name Structural Integrity: value
Classification: Class Name
That is: the “Range” and “Structural Integrity” fields are optional. So the desired result is:
{
$& [Classification: Class Name Range: xxxx Structural Integrity: value ]
$1 [Class Name ]
$2 [xxxx ]
$3 [value ]
$& [Classification: Class Name Structural Integrity: value ]
$1 [Class Name ]
$2 [value ]
$& [Classification: Class Name ]
$1 [Class Name ]
}
The expression uses the ? lazy quantifier in two places. This operator is not supported by QRegExp, instead Qt uses a “minimal” property which, when set to true, makes all quantifiers in an expression non-greedy
Armed with this information I write my code:
QRegExp rx("Classification:\\s([^\\n]+)(?:\\sRange:\\s([^\\n]+))*(?:\\sStructural Integrity:\\s([^\\n]+))*\\n");
rx.setMinimal(true);
But the results are incorrect, and after much tweaking I haven’t been able to get the correct captures. Is it possible to split this up into more code and less regex? Or to rewrite it without the lazy operator?
Something like this:
It matches either valid keys followed by a colon or words. If it is a key, change the current list to the corresponding one. Otherwise add the word to the current list.
In the end, you will have the lists
classification,rangeandintegrity, containing the words after the corresponding keys. You could join them after the full match is done:It does not care about the order of the keys though.