I’m writing a WordPress plugin, and one of the features is removing duplicate whitespace.
My code looks like this:
return preg_replace('/\s\s+/u', ' ', $text, -1, $count);
-
I don’t understand why I need the
u
modifier. I’ve seen other plugins
that usepreg_replaceand don’t
need to modify it for Unicode. I
believe I have a default installation
of WordPress . -
Without the modifier, the code
replaces all the spaces with Unicode
replacement glyphs instead of spaces. -
With the
umodifier, I don’t get
the glyphs, and it doesn’t replace all the whitespace.
Each space below has from 1-10 spaces. The regex only removes on space from each group.
Before:
This sentence has extra space. This doesn’t. Extra space, Lots of extra space.
After:
This sentence has extra space. This doesn’t. Extra space, Lots of extra space.
$count = 9
How can I make the regex replace the whole match with the one space?
Update: If I try this with regular php, it works fine
$new_text = preg_replace('/\s\s+/', ' ', $text, -1, $count);
It only breaks when I use it within the wordpress plugin.
I’m using this function in a filter:
function jje_test( $text ) {
$new_text = preg_replace('/\s\s+/', ' ', $text, -1, $count);
echo "Count: $count";
return $new_text;
}
add_filter('the_content', 'jje_test');
I have tried:
- Removing all other filters on the_content
remove_all_filters('the_content'); - Changing the priority of the filter added to the_content, earlier or later
- All kinds of permutations of
\s+, \s\s+, [ ]+etc. - Even replacing all single spaces with an empty string, will not replace the spaces
This will replace all sequences of two or more spaces, tabs, and/or line breaks with a single space:
You need the
/uflag if$textholds text encoded as UTF-8. Even if there are no Unicode characters in your regex, PCRE has to interpret$textcorrectly.I added
\p{Z}to the character class because PCRE only matches ASCII characters when using shorthands such as\s, even when using/u. Adding\p{Z}makes sure all Unicode whitespace is matched. There might be other spaces such as non-breaking spaces in your string.I’m not sure if using
echoin a WordPress filter is a good idea.