I have this piece of HTML:
<div class="embed">
<iframe width="300" height="200" frameborder="0" allowfullscreen="" src="http://www.youtube.com/embed/123456"></iframe>
Some text I don't want
</div>
This is how it is being inserted into the HTML:
<div class="embed"><?php echo $item['embed_html']; ?></div>
This is what
$item['embed_html']
is echoing out:
<iframe width="300" height="200" frameborder="0" allowfullscreen="" src="http://www.youtube.com/embed/123456"></iframe>Some text I don't want
So I don’t want to parse the whole document, just this specific string.
Don’t worry, this isn’t “outside user” inputted HTML, before anyone points out the security issues with allowing raw code on to a page…
I need to extract the HTML but leave the text (so it would look like this):
<div class="embed">
<iframe width="300" height="200" frameborder="0" allowfullscreen="" src="http://www.youtube.com/embed/123456"></iframe>
</div>
There are multiple different embed codes, so I guess what I’m asking is what is the best way to remove text that is not wrapped in an HTML element (between < and >) (<img, <p, <div, <iframe, <object, <embed, <video etc may all be used in this section). Just that if there is any text added to it that is not wrapped in a tag it should remove it from the string.
I don’t want to wrap the offending text in a tag, I want to completely remove it. In a way, the reverse of strip_tags()
This is a simple regex that would do what you want in 99% of cases:
All it does though is match XML/HTML tags. That’s it. There’s no clean way of telling it to only match text inside the DOM-subtree of a certain node (such as
<div class="embed">). For this you would to use a context free parser, such as a DOM-parser.Your sample input would be matched into:
Given this:
<!-- <foo> -->input text however you would end up with<foo>being extracted despite being technically commented out. Removing all occurences of regex<!--.*?-->beforehand should solve that though.Anyway, in general you’re best off using a DOM parser for anything XML/HTML.