inside a webapplication i am processing requests to a url like
http://example.com/<website-base-url>
im am logging the raw GET parameter of the request in an uft8 database column and in filesystem. for a few chinese domains i get requests with a website-base-url parameter like
%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%A7%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A3%C3%82%C2%A8%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A2%C3%82%C2%B4.cn
Decoding with urldecode returns
ã¥â¤â§ã¥â¤â´ã¨â´â´.cn
This does not seem to be the domain name the user wants to request.
I have tried urlencoding, base64, utf8 and combinations wihtout success.
Any suggestions how decode the given parameter to utf8?
URL percentage encodings simply encode the raw bytes. It does not give you any hint regarding the actual encoding of the text. If you do not know what encoding these bytes represent, all you can do is guess.
GB18030 would seem to be the best candidate, but even that decoded string looks a bit too repetitive to be really useful Chinese.