Is there a way, a regular expression maybe or even a library, which can transform a regular expression with character classes and repetition to its most basic ASCII form.
For example I’d like to have the following conversions:
\d -> [0-9]
\w -> [A-Za-z0-9_]
\s -> [ \t\r\n\v\f]
\d{2} -> [0-9][0-9]
\d{3,} -> [0-9][0-9][0-9]+
\d{,3} -> I dont even know how to show this...
There is a commercial product called RegexBuddy that lets you enter a regex in their syntax and then generate the version for any of a number of popular systems. There may be something similar out there for free, or you could write your own.
At its most basic, a regular expression syntax only needs two things: alternation (OR) and closure (STAR). Well, and grouping. OK, three things. Other common operators are just shortcuts, really:
etc.
Things like
\djust map to character classes and then to alternations. Negated character classes and.map to very big alternations. 🙂There are some features that don’t translate, however, such as lookaround. Mapping those to something that works without the feature is not readily automatable; it will depend upon the particular circumstances motivating their use.