I have a large file that contains world countries/regions that I’m seperating into smaller files based on individual countries/regions. The original file contains entries like:
EE.04 Järvamaa
EE.05 Jõgevamaa
EE.07 Läänemaa
However when I extract that and write it to a new file, the text becomes:
EE.04 Järvamaa
EE.05 Jõgevamaa
EE.07 Läänemaa
To save my files I’m using the following code:
mb_detect_encoding($text, "UTF-8") == "UTF-8" ? : $text = utf8_encode($text);
$fp = fopen(MY_LOCATION,'wb');
fwrite($fp,$text);
fclose($fp);
I tried saving the files with and without utf8_encode() and neither seems to work. How would I go about saving the original encoding (which is UTF8)?
Thank you!
First off, don’t depend on
mb_detect_encoding. It’s not great at figuring out what the encoding is unless there’s a bunch of encoding specific entities (meaning entities that are invalid in other encodings).Try just getting rid of the
mb_detect_encodingline all together.Oh, and
utf8_encodeturns aLatin-1string into aUTF-8string (not from an arbitrary charset toUTF-8, which is what you really want)… You wanticonv, but you need to know the source encoding (and since you can’t really trustmb_detect_encoding, you’ll need to figure it out some other way).Or you can try using
iconvwith a empty input encoding$str = iconv('', 'UTF-8', $str);(which may or may not work)…