I know that there are many types of space (em space, en space, thin space, non-breaking space, etc), but, all these, that I refered, have HTML entities (at least, PHP’s htmlentities() return something like  .
But, what about those spaces that have no HTML entities?
Example: [example URL not valid anymore]
Look at the nickname of this account. It has many ” ” (spaces) at the front, which are visible for us (this doesn’t happen with the ).
I tried already filter with regular expressions, using \x escape, filter with str_replace(), with the space as the argument, and no luck at all!
Do you have any suggestion on how to filter ALL types of whitespace?
\sby default, will not match whitespace characters with values greater than 128. To get at those, you can instead make good use of other UTF-8-aware sequences.(Standard disclaimer: I’m skimming the PCRE source code to compile the lists below, I may miss a character or type something incorrectly. Please forgive me.)
\p{Zs}matches:\h(Horizontal whitespace) matches the same as\p{Zs}above, plusSimilarly for matching vertical whitespace there are a few options.
\p{Zl}matches U+2028 Line separator.\p{Zp}matches U+2029 Paragraph separator.\v(Vertical whitespace) matches\p{Zl},\p{Zp}and the followingGoing back to the beginning, in UTF-8 mode (i.e. using the
upattern modifier)\swill match any character that\p{Z}matches (which is anything that\p{Zs},\p{Zl}and\p{Zp}will match), plusTo cut a long story short (I bet you read all of the above, didn’t you?) you might want to use
\sbut make sure to be in UTF-8 mode like/\s/u. Putting that to some practical use, to filter out those matching whitespace characters from a string you would do something likeFinally, if you really, really care about the vertical whitespaces which aren’t included in
\s(LF and NEL) then you can use the character class[\s\v]to match all 26 of the whitespace characters listed above.