I need a regex or function that can remove the ENCODED HTML tags from a database record. I have text in a database that is being stored (from TinyMCE) as encoded HTML.
The code has the ‘less than’; and ‘greater than’; tags encoded.
I would like to remove all the encoded tags and HTML and just leave the plain text and spaces only.
I’d avoid a reg ex here, as coming up with something that can cover any and all HTML that a user might foist on you is a task that could keep a full-time employee permanently busy.
Instead, a two stop approach that relies on already present PHP functionality is a better choice.
First, let’s turn the encoded HTML entities back into greater than and less than signs with htmlspecialchars_decode.
This should give us a string of proper html. (If your quotes are still encoded, see the second argument in the linked documentation).
To finish, we’ll strip out the HTML tags with the PHP function strip_tags. This will remove any and all HTML tags from the source.
Wrapped in a function/method