I made a regex to remove whitespaces and other garbage such as new lines and tabs.
preg_replace('/[\s\t\n\r]+/mu', ' ', $var);
However my string is html encoded, which means I get some chars replaced with &#…;
What could we do to account for the encoded chars as well?
I wonder if it’s possible to make quantifiers like that to the groups.
Edit
Yes, this appears to be working:
^ produces the expected result:
| t e s t |