I have some text coming from the web as such:
£6.49
Obviously I would like this to be displayed as:
£6.49
I have tried the following so far:
s = url['title']
s = s.encode('utf8')
s = s.replace(u'Â','')
And a few variants on this (after finding it on this very same forum)
But still no luck as I keep getting:
UnicodeDecodeError: ‘ascii’ codec
can’t decode byte 0xc3 in position
100: ordinal not in range(128)
Could anyone help me getting this right?
UPDATE:
Adding the repr examples and content type
u'Star Trek XI £3.99'
u'Oscar Winners Best Pictures Box Set \xc2\xa36.49'
Content-Type: text/html; charset=utf-8
Thanks in advance.
If,
s=url['title']makessequal to this:Then the problem is
url,mal-formed.
If Case 1, we’d need to see the code that defines
url.If Case 2, a quick-and-dirty workaround would be to encode the unicode object
swith theraw-unicode-escapecodec:See also this SO question.
Regarding titles like
s=u'Star Trek XI £3.99': Again, it would be nice fix the problem before it gets to this stage — perhaps by looking at howurlis defined. But assuming the content from the web is mal-formed, a workaround would be:A little bit of explanation:
Note that
So the unicode object
u'£', encoded with theutf-8codec, becomes the string object'\xc2\xa3'.Somehow,
url['title']is getting defined to be the unicode objectu'\xc2\xa3'. (Theumakes a big difference!)Thus we have
u'\xc2\xa3'when we desire'\xc2\xa3'.Encoding the unicode object
u'\xc2\xa3'with theraw-unicode-escapecodec transforms it to'\xc2\xa3'.