Never thought it would be possible to write a Regex which never returns.
The Regex
/^((?:\d|\w{1,2}[-\d\s])(?:[-\s\d]|\w{1,2}[-\d\s])*\d)$/
was built to match numbers which start with a digit or two letters followed by a dash a blank or digits and end with a number. In between the starting pattern could be repeated or a whitespace or dash could occur.
Examples: 1234 , de-12943, EN – 12de -50
Following example code will not terminate:
ruby
#!/usr/bin/ruby
string = "101000000750000000000000000000000001000038127OXMOO0OOOOO00000000000N9"
re = /^((?:\d|\w{1,2}[-\d\s])(?:[-\s\d]|\w{1,2}[-\d\s])*\d)$/
p re.match("101000000750000000000000000000000001000038127OXMOO0OOOOO00000000000N9")
scala
"""^((?:\d|\w{1,2}[-\d\s])(?:[-\s\d]|\w{1,2}[-\d\s])*\d)$""".r findFirstIn "101000000750000000000000000000000001000038127OXMOO0OOOOO00000000000N9"
Removing the Anchors (^, $) lets the regex terminate quickly.
Tried with Ruby and Scala.
What’s happening there?
Should’t the anchors lead to faster termination?
Firstly,
\wis not a letter, but[a-zA-Z0-9_]. So if you really just want letters there make that[a-zA-Z].Secondly, I suppose you might have a case of catastrophic backtracking.
Your regex obviously does not go past
OXM, because there is no way to match three consecutive letters in your pattern. And if you remove the$anchor, the regex will gladly match there, but when you leave it then the regex will fail and start to backtrack.So suppose it matched the
OXwith\w{1,2}and failed. Then it will throw away the last repetition of the whole second non-capturing group and go back one step, where it matched7with with[-\s\d]. Now it will try to match7Oor7with\w{1,2}instead, but then again fails to match[-\d\s]againstXorO, respectively. Another step back, it tries to rematch27or2with\w{1,2}and fails again. And so on and so on. The further you go back, it might be possible again to match the[-\d\s]against a letter, and then the engine will go all the way forward toOXMagain and start the fun again. When backtracking finally reaches the beginning of the string and your very first alternation, it will try all three options for that alternation, too, and will do the whole thing over and over again.Let me try to visualize the first steps of the backtracking by writing out which alternations are used in the repetition. The first of each two lines is the tested string, the second contains the corresponding regex constructs used. Each attempt fails at the last character.
And so on. I hope you get the idea. It’s hard to visualize it in a few lines of ASCII.
I suppose, just changing
\wto the appropriate character group might already solve the problem, because there are less equivalent combinations. Try it out.