Assume I’m on a computer with character set “a”, and browsing a page with character set “b”, of which “a” and “b” are wildly different character sets. Specifically the character code for a space in “a” is not a space in “b”.
If I type a space into a text input on the page, would the page register it as a space? And, when sent to be processed by the server, would that be processed like a space or another character?
Ignore what character set your computer and server are “on”, that doesn’t matter. What matters is the character set of the given HTTP request/response. If you request a resource and the server returns character set “B” then your browser will try to parse the response using character set “B”. Most browsers are able to parse many different character sets regardless of the underlying computer’s current language settings. If your browser doesn’t know about the supplied character set (which would be a rare case, my IE has 34 character sets and my Firefox has 74) then that is specifically undefined. It may guess or it might throw an error, its up to the browser to decide.
Many (or possibly most) character sets are partly based on
ASCIIand therefor map the first 127 characters the same way. Even the double-byteISO/IEC 2022does. All HTML tags are based on theASCIIset so for these cases the browser might guess at the encoding (some might assumeISO-8859-1) so they should be able to render the structure of the document at least. However some encodings, such as the various flavors ofEBCDICdon’t map toASCII. In some versions theEBCDIC<symbol maps to the ASCIILsymbol, so HTML rendering would completely fail and the raw bytes (probably parsed as ISO-8859-1) would be displayed instead.So if your browser encounters a
SHIFT_JISdocument but doesn’t know how to parse the bytes it will probably attempt to parse it usingISO-8859-1. Because the first 127 characters inSHIFT_JISmap the same as inISO-8859-1(for the most part), all HTML should render just fine. The text, however, will probably use the browser’s “unknown” character which is sometimes a question mark or some form of boxed character. If this document has a form in it and you start typing in it, the keys on your keyboard will be mapped to what the browser is guessing at, which is once again probablyISO-8859-1. When you hit submit, those characters will be encoded as the “guessed” character set and be sent to the server as such.