So basically my regex is not working as I expect & I don’t know why.
I am working in a fairly regulated environment so this should not be too much of a problem – all the html tags are generated by a script & follow this pattern: only li, p and h(3-6) tags are present. all text is between tags and there are no spaces between tags.
I ‘need’ to write something to surround the lis with ul tags. here is what i got:
preg_replace('#(<li>[^<p|<h]+</li>)(?!<li>)#', '<ul>$1</ul>', $html)
however it only matches the last li pair in a set for some reason. Anyone can tell me why … please?
[^<p|<h]doesn’t do what you expect. It matches a single character that is not any of the characters<p|h. If your HTML really is as constrained as you say, and you cannot have an<li>nested inside another<li>, then the following should work:The sequence
.*?is just like.*except the trailing?is the non-greedy modifier. By default.*is greedy – it will consume as many characters as it can, then backtrack if the rest of the pattern doesn’t match. The non-greedy modifier inverts this. It consumes as few characters as it can and advances if the rest of the pattern cannot match. As the rest of the pattern is simply</li>, this effectively captures all text up to, but not including, the first sequence</li>. This pattern is then nested inside a capture which is then repeated with+, meaning it will match one or more sequences of<li>tags.