I’m trying to use this reg exp in PHP in a preg_match_all
/\d+ (?:<[^>]+>)(?:<[^>]+>)(\S+.*\S+)(?:<[^>]+>)\s*(\S+) (?:L|R)\s*\w* \w*\s*(?:\w+\s*){14}(\d+)\s*(\d)\s*(\d*\xA0*\d{3}\xA0*\d{3})/is
There’s some data sample :
38 <A NAME="Philip McRae"><A HREF="xtrastats.html#Philip McRae">Philip McRae</A> C L OK 58 71 69 49 33 89 71 45 48 69 50 35 32 61 21 3 787 000
43 <A NAME="Alexander Nikulin"><A HREF="xtrastats.html#Alexander Nikulin">Alexander Nikulin</A> C L OK 41 68 71 40 28 90 67 29 31 60 31 37 34 50 26 0 0 000 <a href="http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=78680" target="_blank">HDB</a>
20 <A NAME="Christian Hanson"><A HREF="xtrastats.html#Christian Hanson">Christian Hanson</A> C R OK 57 72 71 54 33 79 70 42 45 71 46 40 36 60 25 1 875 000 <a href="http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=73824" target="_blank">HDB</a>
I got around 1500 lines.
I need to match this :
Philip McRae, C, 21, 3, 787 000 (Name, Position, Age, Contract Lenght, Salary)
Each time I run my code, I got an Fatal error: Maximum execution time of 30 seconds exceeded error.
After some search I add this line at top of my script but that’s not solve my problem
ini_set("pcre.backtrack_limit",10000000);
Anyone can help me with this reg exp for some optimisation ?
Regards.
Patrick
I will not attempt to rewrite your regular expression since we do not have the requirements, but the main issue here is your name group:
The
.*is greedy. Meaning it will consume as much as it can including what you’re expecting the rest of your expression to match, and it doesn’t stop there. Since you have the/spattern modifier, the dot will also match newlines, allowing.*to consume the entire file before trying to match\Sand beginning its long backtracking journey.One solution is to make the
.*lazy with?, i.e..*?but since you know the name is contained within an element you can simply use a negated character class for the entire group:That should fix your issue, but you probably do not want to be using the
/spattern modifier in this case or you should at least add beginning and end of line anchors to your pattern. You should also try to limit your use of*.Please see:
Catastrophic backtracking and
Watch out for greediness