I’m re-writting a route handling class for a MVC based site in PHP and need a regex to detect a pagination string in the URL. The pagination string is formed of three different parts;
- Page number detection:
/page/[NUMERIC]/ - Items per page detection:
/per_page/[NUMERIC]/ - Ordering detection:
/sort/[ALMOST_ANY_CHARACTER]/[asc or desc]/
Due to the way it was previously developed, these three parts can be in any order. There are a number of existing links which I need to keep working plus the code used to handle pagination (no plans for a re-write yet) – so changing the pagination code to always generate a consistent url isn’t possible.
Therefore, I need to build a regex pattern to detect every possible combination of the pagination structure. I have three patterns to detect each part, which are as follows:
- Page number detection:
(page/\d+) - Items per page detection:
(per_page/\d+) - Ordering detection:
(sort/([a-zA-Z0-9\.\-_%=]+)/(asc|desc))
Being new to writing complex (well this is complex to me anyway!) regex patterns, the only I can think of doing it is two combine the three patterns I have for each of the url structures (eg /pagenum/ordering/perpage/, /pagenum/perpage/ordering/) and using the | operator as an ‘or’ statement.
Is there a better / more efficient way of doing this?
I am running the regex using preg_match.
You could use lookaheads. After a lookahead is completely matched position of the regex engine jumps back to where it start (that’s why it’s called *look*ahead; it doesn’t actually advance the position in the subject string or include anything in the match). Since you don’t know when the desired part occurs, start all three lookaheads from the beginning of the string, and prepend the capturing groups with
.*to allow an arbitrary position:You can maybe even switch around the capturing groups a bit:
Now the captures will be:
If any of these can be optional, you can simply make the entire lookahead optional with
?.