I wrote a pretty simple preg_match_all file in PHP:
$fileName = 'A_DATED_FILE_091410.txt';
$matches = array();
preg_match_all('/[0-9][0-9]/',$fileName,$matches);
print_r($matches);
My Expected Output:
$matches = array(
[0] => array(
[0] => 09,
[1] => 91,
[2] => 14,
[3] => 41,
[4] => 10
)
)
What I got instead:
$matches = array(
[0] => array(
[0] => 09,
[1] => 14,
[2] => 10
)
)
Now, in this particular use case this was preferable, but I’m wondering why it didn’t match the other substrings? Also, is a regex possible that would give me my expected output, and if so, what is it?
With a global regex (which is what
preg_match_alluses), once a match is made, the regex engine continues searching the string from the end of the previous match.In your case, the regular expression engine starts at the beginning of the string, and advances until the
0, since that is the first character that matches[0-9]. It then advances to the next position (9), and since that matches the second[0-9], it takes09as a match. When the engine continues matching (since it has not yet reached the end of the string), it advances its position again (to1) (and then the above repeats).See also: First Look at How a Regex Engine Works Internally
If you must get every 2 digit sequence, you can use
preg_matchand use offsets to determine where to start capturing from:Note that the offset returned with the
PREG_OFFSET_CAPTUREflag is the start of the match.I’ve got another solution that will get five matches without having to use offsets, but I’m adding it here just for curiosity, and I probably wouldn’t use it myself in production code (it’s a somewhat complex regex too). You can use a regex that uses a lookbehind to look for a number before the current position, and captures the number in the lookbehind (in general, lookarounds are non-capturing):
Let’s walk through this regex:
Because lookarounds are zero-width and do not move the regex position, this regular expression will match 5 times: the engine will advance until the
9(because that is the first position which satisfies the lookbehind assertion). Since9matches [0-9], the engine will take9as a match (but because we’re capturing in the lookaround, it’ll also capture the0!). The engine then moves to the1. Again, the lookbehind succeeds (and captures), and the1is added as a 1st subgroup match (and so on, until the engine hits the end of the string).When we give this pattern to
preg_match_all, we’ll end up with an array that looks like (using thePREG_SET_ORDERflag to group capturing groups along with the full match):Note that each “match” has its digits out of order! This is because the capture group in the lookbehind becomes backreference 1 while the whole match is backreference 0. We can put it back together in the correct order though: