If no charset parameter is specified in the Content-Type header, RFC2616 section 3.7.1 seems to imply ISO8859-1 should be assumed for media types of subtype "text":
When no explicit charset parameter is
provided by the sender, media subtypes
of the "text" type are defined to have
a default charset value of
"ISO-8859-1" when received via HTTP.Data in character sets other than
"ISO-8859-1" or its subsets MUST be
labeled with an appropriate charset
value.
However, I routinely see applications that serve up Javascript files with Content-Type values like "application/x-javascript" (i.e. no charset param), even when these scripts contain non-ASCII UTF-8 characters, which would be corrupt if interpreted as ISO8859-1.
This does not seem to pose problems to clients. How do clients know to interpret the bytes as UTF-8? Is there a rule for other character-data subtypes that implies UTF-8 should be the default? Where is this documented?
All major browsers I’ve checked (IE, FF and Opera) completely ignore the RFC specification in this part.
If you are interested in the algorithm to auto-detect charset by data, look at Mozilla Firefox link.
Just a small note about content types: Only text has character sets. It’s reasonable to assume that browsers handle application/x-javascript the same as they handle text/javascript ( except IE6, but that’s another subject ).
Internet Explorer will use the default charset (probably stored at registry), as noted:
Source: http://msdn.microsoft.com/en-us/library/ms537500%28VS.85%29.aspx
Mozilla Firefox attempts to auto-detect the charset, as pointed here:
Source: http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
Opera uses auto-detection too, as documented:
Source: http://www.opera.com/docs/specs/opera9/