Is it possible to write a regular expression which will match any subset of a given set of characters
a1 ... an ?
I.e. it should match any string where any of these characters appears at most once, there are no other characters and the relative order of the characters doesn’t matter.
Some approaches that arise at once:
1. [a1,...,an]* or (a1|a2|...|an)*– this allows multiple presence of characters
2. (a1?a2?...an?) – no multiple presence, but relative order is important – this matches any subsequence but not subset.
3. ($|a1|...|an|a1a2|a2a1|...|a1...an|...|an...a1), i.e. write all possible subsequences (just hardcode all matching strings :)) of course, not acceptable.
I also have a guess that it may be theoretically impossible, because during parsing the string we will need to remember which character we have already met before, and as far as I know regular expressions can check out only right-linear languages.
Any help will be appreciated. Thanks in advance.
Can’t think how to do it with a single regex, but this is one way to do it with
nregexes: (I will usr12…mnetc for youras)If all the above match, your string is a strict subset of
12..mn.How this works: each line requires the string to consist exactly of:
a particular onea particular onea particular oneIf this passes when every element in turn is considered as
a particular one, we know:as required.
for completeness I should say that I would only do this if I was under orders to “use regex”; if not, I’d track which allowed elements have been seen, and iterate over the characters of the string doing the obvious thing.