i got a problem occuring when using regular expressions:
php> $html = "<html><head><body><h1>hello world</h1><img src=\"data:rawIMGdata\" /><p/><img src=\"sdfsdf.jpg\" title=\"pic1\" /><p/><div class=\"myclass\"><img src=\"data:imageData\" /></div><img alt=\"bla\" src=\"bla.jpg\" title=\"bla\" /></body></html>";
php> $pat = '/<img.*src="(data:.*)"/m';
php> preg_match_all($pat, $html, $matching);
php> var_dump($matching);
array(2) {
[0]=>
array(1) {
[0]=>
string(169) "<img src="data:rawIMGdata" /><p/><img src="sdfsdf.jpg" title="pic1" /><p/><div class="myclass"><img src="data:imageData" /></div><img alt="bla" src="bla.jpg" title="bla""
}
[1]=>
array(1) {
[0]=>
string(63) "data:imageData" /></div><img alt="bla" src="bla.jpg" title="bla"
}
}
My expected output would be just an occurence of “data:imageData” in the second array and moreover there should be two matches (“data:rawIMGdata”)
Did i define my regex a wrong way?
Regards,
Broncko
You might want to consider using DOM Document for parsing HTML, although if this example is a complex as it is going to get then you can probably get away with regex; DOM Document will always be more robust though.
Try this:
The ? sets the * to be non-greedy (so it will get the minimum match, by default it grabs as much as it can)
And rather than match anything, you can match anything that isn’t a ” with [^”].
The .* before was being greedy and matching up to the ” in another element