I have a unicode string like ‘%C3%A7%C3%B6asd+fjkls%25asd’ and I want to decode this string.

Question

0

Asked: May 20, 20262026-05-20T10:24:12+00:00 2026-05-20T10:24:12+00:00

I have a unicode string like ‘%C3%A7%C3%B6asd+fjkls%25asd’ and I want to decode this string.

0

I have a unicode string like '%C3%A7%C3%B6asd+fjkls%25asd' and I want to decode this string.
I used urllib.unquote_plus(str) but it works wrong.

expected : çöasd+fjkls%asd
result : Ã§Ã¶asd fjkls%asd

double coded utf-8 characters(%C3%A7 and %C3%B6) are decoded wrong.
My python version is 2.7 under a linux distro.
What is the best way to get expected result?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T10:24:12+00:00

You have 3 or 4 or 5 problems … but repr() and unicodedata.name() are your friends; they unambiguously show you exactly what you have got, without the confusion engendered by people with different console encodings communicating the results of print fubar.

Summary: either (a) you start with a unicode object and apply the unquote function to that or (b) you start off with a str object and your console encoding is not UTF-8.

If as you say you start off with a unicode object:

>>> s0 = u'%C3%A7%C3%B6asd+fjkls%25asd'
>>> print repr(s0)
u'%C3%A7%C3%B6asd+fjkls%25asd'

this is an accidental nonsense. If you apply urllibX.unquote_YYYY() to it, you get another nonsense unicode object (u'\xc3\xa7\xc3\xb6asd+fjkls%asd') which would cause your shown symptoms when printed. You should convert your original unicode object to a str object immediately:

>>> s1 = s0.encode('ascii')
>>> print repr(s1)
'%C3%A7%C3%B6asd+fjkls%25asd'

then you should unquote it:

>>> import urllib2
>>> s2 = urllib2.unquote(s1)
>>> print repr(s2)
'\xc3\xa7\xc3\xb6asd+fjkls%asd'

Looking at the first 4 bytes of that, it’s encoded in UTF-8. If you do print s2, it will look OK if your console is expecting UTF-8, but if it’s expecting ISO-8859-1 (aka latin1) you’ll see your symptomatic rubbish (first char will be A-tilde). Let’s park that thought for a moment and convert it to a Unicode object:

>>> s3 = s2.decode('utf8')
>>> print repr(s3)
u'\xe7\xf6asd+fjkls%asd'

and inspect it to see what we’ve actually got:

>>> import unicodedata
>>> for c in s3[:6]:
...     print repr(c), unicodedata.name(c)
...
u'\xe7' LATIN SMALL LETTER C WITH CEDILLA
u'\xf6' LATIN SMALL LETTER O WITH DIAERESIS
u'a' LATIN SMALL LETTER A
u's' LATIN SMALL LETTER S
u'd' LATIN SMALL LETTER D
u'+' PLUS SIGN

Looks like what you said you expected. Now we come to the question of displaying it on your console. Note: don’t freak out when you see “cp850”; I’m doing this portably and just happen to be doing this in a Command Prompt on Windows.

>>> import sys
>>> sys.stdout.encoding
'cp850'
>>> print s3
çöasd+fjkls%asd

Note: the unicode object was explicitly encoded using sys.stdout.encoding. Fortunately all the unicode characters in s3 are representable in that encoding (and cp1252 and latin1).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a unicode string like ‘%C3%A7%C3%B6asd+fjkls%25asd’ and I want to decode this string.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply