I’ve got a bunch of HTML data that I’m writing to a PDF file using PHP. In the PDF, I want all of the HTML to be stripped and cleaned up. So for instance:
<ul>
<li>First list item</li>
<li>Second list item which is quite a bit longer</li>
<li>List item with apostrophe 's 's</li>
</ul>
Should become:
First list item
Second list item which is quite a bit longer
List item with apostrophe 's 's
However, if I simply use strip_tags(), I get something like this:
First list item

Second list item which is quite a bit
longer

List item with apostrophe ’s ’s
Also note the indentation of the output.
Any tips on how to properly cleanup the HTML to nice, clean strings without messy whitespace and odd characters?
Thanks 🙂
you can decode the result of strip_tags using html_entity_decode or remove them using preg_replace:
and to remove white spaces from the beginning of your lines use ltrim:
to keep apostrophes use this instead: