I’m playing a bit in Java with the URLDecoder class to analyze some urls, and I’ve hit an issue. I’m not sure if it’s a bug or expected behavior, so here it is.
Consider this URL:
https://id2.s.nfl.com/fans/mobile/login?gigyresp=true&city=S%u00e3o+Paulo%2c+Brazil&profileURL=…
URLDecoder is choking on the “São Paulo” part, especially the “ã” which seems to be encoded as “%u0”. Pretty much anything else seems to be handled fine, but this particular doesn’t.
I’m using the following:
URLDecoder.decode(url, "UTF-8");
My stack trace is:
Caused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u0"
at java.net.URLDecoder.decode(URLDecoder.java:173)
Any thoughts how I could make URLDecoder parse this correctly?
URL enoding is done with octets
%AB. Your encoding seems to be a mix of java string encoding\u00e3and URL encoding like%xxxx, which is not valid.If you change the string to
"S\u00e3o Paulo, Brazil"and encode it withURLEncoder.encode(url, "UTF-8"), you will getS%C3%A3o+Paulo%2c+Brazil, which is perfectly decodable.