Is it possible, prior to converting a string from a charset to another, to know whether this conversion will be lossless?
If I try to convert an UTF-8 string to latin1, for example, the chars that can’t be converted are replaced by ?. Checking for ? in the result string to find out if the conversion was lossless is obviously not a choice.
The only solution I can see right now is to convert back to the original charset, and compare to the original string:
function canBeSafelyConverted($string, $fromEncoding, $toEncoding)
{
$encoded = mb_convert_encoding($string, $toEncoding, $fromEncoding);
$decoded = mb_convert_encoding($encoded, $fromEncoding, $toEncoding);
return $decoded == $string;
}
This is just a quick&dirty one though, that may come with unexpected behaviours at times, and I guess there might be a cleaner way to do this with mbstring, iconv, or any other library.
An alternative way is to set up your own error handler with set_error_handler(). If you use iconv() on the string it will throw a notice if it can not be fully converted that you can catch there and react to in your code.
Or you could just count the number of question marks before and after encoding. Or call iconv() with //IGNORE and count the number of characters.
None of the suggestions much more elegant than yours, but gets rid of the double processing.