I’m trying to developing a code validation system for an in-house markup language but I’m having a little trouble due to my inexperience with regexes. The tags in the language follow the format of:
{ tag : number : phrase 1 | phrase 2 … | phrase n}
where number is a number in the range (3.0, 3.5, 4.0 … 8.5) and exactly one of the phrases must have an asterisk at its end and there must be at least two phrases. Please note that the tags are case-insensitive and whitespace does not matter.
The regex I’m using is:
\{ ?(mw) ?: ?[3-8]{1}(.0|.5)? ?((((\| ?(\w ?)+[\p{P}]? ?)*)+((\| ?(\w\ ?)+[\p{P}]?)* ?\* ?)+((\| ?(\w ?)+[\p{P}]? ?)*)?)|(((\| ?(\w ?)+[\p{P}]? ?)*)?((\| ?(\w ?)+[\p{P}]?)* ?\* ?)+((\| ?(\w ?)+[\p{P}]? ?)*)+))( ?\})
which does match the correct case of:
{ mw : 3.5 | phrase 1 | phrase 2* | phrase 3}
but also the incorrect cases of:
{ mw : 3.5 | phrase 1* | phrase 2* | phrase 3} [Two asterisks]
and
{ mw : 3.5* | phrase 1 | phrase 2* | phrase 3} [An asterisk with the number value]
Thanks for any help.
And if anyone wants to offer any insight into how data validation systems typically work I would appreciate the insight.
Here:
Here’s a demo, tested with the following input:
UPDATE
Some notes.
?says “0 or 1 space characters”. You may have meant\s*which means “0 or more whitespace characters”.(.0|.5)actually matchesA0andB5.[\p{P}]?.UPDATE 2
Highly doubt you’re able to add flags, but the
xflag would shorten this regex considerably: