I have a website that accepts user submissions to a forum from different locales, English and Swedish are the current “supported” locales. The most common language on the forums are in Swedish and that is where I encounter the intermittent problem of character encoding.
Could it be that some browsers are sending me ISO 8859 encoded strings but the page is encoded in utf-8 (and should be submitted in that encoding?). My php serverside is guessing the encoding with stuff like mb_detect_encoding but that doesn’t seem to help.
I have this code to “guess” the encoding
if ( mb_detect_encoding($str, 'UTF-8, ISO-8859-1') == 'ISO-8859-1') {
return mb_convert_encoding($str, 'UTF-8', 'ISO-8859-1');
}
return $str;
on the submissions. Other encoding options are not an issue for this particular problem.
Any help would be appreciated.
The browser may send data in any character encoding, regardless of the character encoding of your HTML page. It should advertise the used encoding in the Content-Type header. You can use the
accept-charsetAtrribute on theformto specify which character encodings you want to receive.