I have a CMS system that allows people to also use HTML code, but a nl2br is provided at the end of the function, which makes this:
<ul>
<li></li>
</ul>
into this:
<ul><br/>
<li></li><br/>
</ul>
Now I want to remove these <br/>‘s that exist between <ul> tags.
I already found another question which asks almost the same, but for newlines. I’ve integrated this inside my CMS but for one client all the content is already filled in so I have to fix this after the \n‘s are replaced with <br/>‘s.
The other question provides this as a regex to match \n within <ul></ul>:
/(?<=<ul>|<\/li>)\s*?(?=<\/ul>|<li>)/is
I’d think something like this:
/(?<=<ul>|<\/li>)(<br>|<br\/>|<br \/>)(?=<\/ul>|<li>)/is
Would do the trick, but it doesn’t. What am I missing?
EDIT
I am very open for DOMDocument solutions, if there’s a way to query linebreaks with xpath this would probably fix my problem.
In the example you provided,
<br>tags are surrounded by some white-space (at least by new line characters), so this needs to be reflected in the corresponding regular expression.In many cases regular expressions are NOT the best way for parsing HTML (I definitely agree with the comments above/below), but they are always good enough for some particular purposes.