An application I’m maintaining loads user agents extracted from web logs into a MySQL table column using the ‘latin1’ charset. Occasionally, it fails to load a user agent that looks like this:
Mozilla/5.0 (Iâ?; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML^C like Gecko) Version
I suspect it’s choking on Iâ?. I’m working to figure out if this should be supported, or if it’s corruption introduced by the upstream logging system. Is this a legal user agent in a HTTP header?
RFC 2616 (HTTP 1.1) says that message header contents must be “consisting of either
*TEXTor combinations of token, separators, and quoted-string”. If you look at the definitions for TEXT etc you will find that legal characters are those with byte values not in the [0, 31] range and not equal to 127; therefore characters such asâare as far as I can tell legal as per the spec.