I import some text from an XML file and I trim it and replace multiple white spaces.
$var = $myxmltext;
$var = trim($var);
$var = preg_replace('/\s+/',' ',$var);
For some reason I get “raw html” like this when I echo it:
quot; or IÂ’ve instead of I've
Any ideas why?
Here is my trim function:
function mytrim($mytrim){
$mytrim = utf8_decode($mytrim);
$mytrim = trim($mytrim);
$rule1 = array(
",", // virgula
".", // punct
"~", // ~
"_", // underscore
"-", // liniuta
")", // paranteza inchidere
":", // doua puncte
">", // mai mare
"<", // mai mic
"!",
"?",
"*",
"&"
);
$rule2 = array(
", ", // virgula
". ", // punct
" ~ ", // ~
" ", // underscore
" - ", // liniuta
") ", // paranteza inchidere
": ", // doua puncte
" > ", // mai mare
" < ", // mai mic
"! ",
"? ",
"* ",
" & "
);
$mytrim = str_replace($rule1, $rule2, $mytrim);
$rule3 = array(
" .", // punct
" ,", // virgula
" ?", // question mark
" !",
" *",
" )"
);
$rule4 = array(
".", // punct
",", // virgula
"?", // question mark
"!",
"*",
")"
);
$mytrim = str_replace($rule3, $rule4, $mytrim);
$mytrim = preg_replace('/\s+/',' ',$mytrim);
return $mytrim;
}
Try this regex before you do you stuff:
Then do your business, lets see if that HTML encodes right now.
So what this will do is solve your main problem of HTML encoding by changing all:
to:
Be aware: This might not work exactly as expected so please test.
Of course as others say you can also utf8_decode/encode as well to get rid of those umlet characters.
Edit
To solve the Ampersand problem try:
So this will replace all & that is not in the form
"e;and give them a space either side.Same as normal, test it first.