I’m trying to remove repeating white-space characters from UTF8 string in PHP using regex.

Question

0

Asked: June 14, 20262026-06-14T14:52:51+00:00 2026-06-14T14:52:51+00:00

I’m trying to remove repeating white-space characters from UTF8 string in PHP using regex.

0

I’m trying to remove repeating white-space characters from UTF8 string in PHP using regex.
This regex

    $txt = preg_replace( '/\s+/i' , ' ', $txt );

usually works fine, but some of the strings have Cyrillic letter “Р”, which is screwed after the replacement.
After small research I realized that the letter is encoded as \x{D0A0}, and since \xA0 is non-breaking white space in ASCII the regex replaces it with \x20 and the character is no longer valid.

Any ideas how to do this properly in PHP with regex?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T14:52:52+00:00

it is described @ http://www.php.net/manual/en/function.preg-replace.php#106981

If you want to catch characters, as well european, russian, chinese, japanese, korean of whatever, just:

use mb_internal_encoding(‘UTF-8’);
use preg_replace(‘...u’, ‘…’, $string) with the u (unicode) modifier

For further information, the complete list of preg_* modifiers could be found at :
http://php.net/manual/en/reference.pcre.pattern.modifiers.php

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to remove repeating white-space characters from UTF8 string in PHP using regex.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply