I’ve used preg_match_all’s PREG_PATTERN_ORDER to return positions in the values found in a string. Then when trying to substr, referencing them back, they don’t line up. I expected an offset, but there seems to be an undetermined misalignment per case.
Is this because preg_match_all is returning bytes and not characters? If so, is there a way to convert bytes to characters? If I’m completely off the mark I can post some code…
Okay here is the applicable code:
// RETURN POSITION OF START AND END TAGS TO ARRAY
function getTagPositions($strBody, $start, $end)
{
preg_match_all('/' . preg_quote($start, '/') . '([\w\s.]*?)'. preg_quote($end, '/').'/im', $strBody, $strTag, PREG_PATTERN_ORDER);
$intOffset = 0;
$intIndex = 0;
$intTagPositions = array();
foreach($strTag[0] as $strFullTag) {
$intTagPositions[$intIndex] = array('start' => (strpos($strBody, $strFullTag, $intOffset)), 'end' => (strpos($strBody, $strFullTag, $intOffset) + strlen($strFullTag)));
$intOffset += strlen($strFullTag);
$intIndex++;
}
return $intTagPositions;
}
function arrayValRecursive($key, array $arr){
$val = array();
array_walk_recursive($arr, function($v, $k) use($key, &$val){
if($k == $key) array_push($val, $v);
});
return count($val) > 1 ? $val : array_pop($val);
}
$arrayOfPositions = getTagPositions($html,$go,$stop);
$arrayOfStart = arrayValRecursive('start', $arrayOfPositions); //print_r($arrayOfStart);
$arrayOfEnd = arrayValRecursive('end', $arrayOfPositions); //print_r($arrayOfEnd);
$offset = 0;
$range = $arrayOfStart[$i] + $offset;
$rangeEnd = $arrayOfEnd[$i];
echo '<br>'.$range.' to '.$rangeEnd.' is: <br>';
echo substr($html, $range, $rangeEnd);
According to preg_match_all
If you want to get offsets into the string, use
PREG_OFFSET_CAPTUREHere’s an example:
This gives as output
You can see the pattern matches
$count=2times. It matches “Hello,” at position$matches[0][0][1]=0and it matches “world!” at position$matches[0][1][1]=7And here’s how you loop through all matches