I am working with regular expressions to transform HTML into BBCODE. But, with code

Question

0

Asked: June 6, 20262026-06-06T21:46:25+00:00 2026-06-06T21:46:25+00:00

I am working with regular expressions to transform HTML into BBCODE. But, with code

0

I am working with regular expressions to transform HTML into BBCODE. But, with code coming from farmer WYSIWYG editors (TinyMce) I am getting issues. It is a very curious case:

There are some typical blank pharagraphs, <p> </p>, but I cannot match them in any way. No one of the folllowing regexp’s are working:

str_replace("<p>&nbsp;</p>",........)
str_replace("<p> </p>".........)
preg_replace("#<p>.?</p>#"....)

This DOES work, but what if the “spaces” are in other places, how could I match them?:

preg_replace("#<p>.{1,6}</p>#"....)

How can I get it to match all the   even if they aren’t written (in the BD, where the original string is stored, the   are not written, there are just <p> </p> blocks) It is quite strange…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T21:46:26+00:00

Editorial Team

2026-06-06T21:46:26+00:00Added an answer on June 6, 2026 at 9:46 pm

I recommend you to read Unicode Regular Expressions and Wikipedia: Unicode Whitespace character.

Script:

$string = '123<p>  &nbsp;  &nbsp;  </p>abc';
$pattern = '/<p>(&nbsp;|[\s\p{Z}\p{C}\x85\xA0\x{0085}\x{00A0}\x{FFFD}]+)*<\/p>/iu'; 
$replacement = ''; 
echo preg_replace($pattern, $replacement, $string);

Output:

123abc

Note: To match any single unicode grapheme use pattern \P{M}\p{M}*+

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working with regular expressions to transform HTML into BBCODE. But, with code

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply