I’m trying to replace string “Red Dwarf (TV Series 1988—) – IMDb” to “Red Dwarf (TV Series 1988′) – IMDb”
I have a translation table of these funny characters in an array. I tried to replace them using: str_replace but it did not work. Can anybody suggest a workaround on this? This is the snippet of the code:
function replaceFunnyChar( $input ){
$translation = array(
'’' => "'",
"â€\"" => '-',
'é' => 'é',
'è' => 'è',
'“' => '"',
'â€' => '"',
'‘' => "'",
'â' => 'ã',
'Ã"' => 'ä',
'â€"' => '–',
'Ä«' => 'ī',
'阴' => '阴',
'é™°' => '陰',
"阳" => "阳",
"陽" => "陽",
'´' => "'",
'ü' => 'ü',
"Ã,Ã'" => "'",
'•' => '–'
);
foreach( $translation as $find => $replace ){
$output = str_replace($find, $replace, $input );
//$output = preg_replace("/" . $find . "/", $replace, $input );
}
return $output;
}
It is best to detect the encoding of the data you have (if you are scraping, then it is in the HTTP header, and overridden by the meta tag in the HTML), then you can use something such as Iconv to convert it: http://php.net/manual/en/book.iconv.php
If the data you get is UTF-8, you don’t actually need to convert it. Just store it and make sure your DBMS is set up to support UTF-8. Then when displaying the data again, make sure you specify UTF-8 on your webpage.
If you are using Windows command line to show the characters, it is a little more complicated as Windows command line doesn’t use UTF-8. Try Ubuntu or Mac OS X.
Also, if you already have the data but cannot download it again, then you need to make sure how you show the characters — if shown on a webpage, then the web browser can further mess up the characters if it uses a different encoding than what it is supposed to be. You can also dump the bytes out, and replace the string using the byte sequence instead of quoted string as in the original code.