I am working with regular expressions to transform HTML into BBCODE. But, with code coming from farmer WYSIWYG editors (TinyMce) I am getting issues. It is a very curious case:
There are some typical blank pharagraphs, <p> </p>, but I cannot match them in any way. No one of the folllowing regexp’s are working:
str_replace("<p> </p>",........)
str_replace("<p> </p>".........)
preg_replace("#<p>.?</p>#"....)
This DOES work, but what if the “spaces” are in other places, how could I match them?:
preg_replace("#<p>.{1,6}</p>#"....)
How can I get it to match all the even if they aren’t written (in the BD, where the original string is stored, the are not written, there are just <p> </p> blocks) It is quite strange…
I recommend you to read Unicode Regular Expressions and Wikipedia: Unicode Whitespace character.
Script:
Output:
Note: To match any single unicode grapheme use pattern
\P{M}\p{M}*+