This appears to be strange behavior, or perhaps I don’t understand regular expressions so well…
I’m using this to find all the xref and trailer objects in a PDF file:
preg_match_all('@(\nxref\r?\n)|(\strailer\s)@',$pdfcontent,$matches,PREG_OFFSET_CAPTURE);
print_r gives me this:
Array
(
[0] => Array
(
[0] => Array
(
[0] =>
xref
[1] => 13235519
)
[1] => Array
(
[0] =>
trailer
[1] => 13299371
)
)
[1] => Array
(
[0] => Array
(
[0] =>
xref
[1] => 13235519
)
[1] => Array
(
[0] =>
[1] => -1
)
)
[2] => Array
(
[0] =>
[1] => Array
(
[0] =>
trailer
[1] => 13299371
)
)
)
Why is there a position of -1 for xref?
It seems this is the normal behaviour, mostly undocumented though. The
-1offset is also used for absent matches.To answer your title, the
-1offset is returned alternatively, not in addition. You have an alternative(a)|(b)match group in your pattern. So it can very well return offsets and matches for thexref, but a non-match for thetrailer.This is not mentioned explicitely in the PHP manual page. But PCRE documents it cursorily with:
You can reproduce it with a simpler example:
[Have a look]. The behaviour is a bit confusing. It seems the
-1is used as offset for the early non-matches. But subsequent failed matches are just absent in the result array. This example gives[0,-1,-1]and[undef,1,-1]and[undef,undef,2]for example. I would conclude it’s some hazy behaviour in the PHP wrapper.