I am receiving web response in different encoding using python and my expected output should have to same as given on the web page
Ex : Marc Barbé
The last character should remain same after the parsing of html response.
Currently I am using following code for this
unicode.join(u'\n',map(unicode,item))
In some cases when there is no special encoding is given it is throwing following error :
Ex: Markus Rygaard, Alberte Blichfeldt, Flemming Quist, Møller
Traceback (most recent call last):
File "BFICrawl.py", line 20, in <module>
print attrName + " : " + attrValue
File "C:\Python27\LIB\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xf8' in position 6
0: character maps to <undefined>
I really not able to find the reason for the same. Is there any alternate way available for getting the same encoding content from web.
You have successfully obtained
unicodeobjects from the web. You should not need to do things likeunicode.join(u'\n',map(unicode,item)). The problem is happening when you try to output the unicode.You are running your script in a Windows “Command Prompt” window. The script is printing to the console. The console encoding is
cp437. That is a very limited (8-bit) encoding. It can’t handle the second character inMøller, and an enormous bunch of other charactersRemedy: Run your script in IDLE (supplied with your Python) or some other IDE.
Alternatively, if you are printing to the console for debug purposes only, instead of
print foouseprint repr(foo)