- OS: Windows 7, 64-bit
- Python 3.1.3
When I try to do this
os.listdir("F:\\music")
I get this
UnicodeEncodeError: 'gbk' codec can't encode character '\xe3' in position 643: illegal multibyte sequence
os.listdir works with other directories so the cause of the problem is obviously some strangely-encoded file or folder within F:\music itself. How do I find the source of this error?
UnicodeEncodeErrorindicates that you are trying to print the filenames. If it wasos.lisdir()that had a problem you should see aUnicodeDecodeError(Decode, not Encode).Because you use a Unicode pathname,
os.listdir()returns readily decoded filenames; on Windows the filesystem uses UTF-16 to encode filenames and those are easily decoded in Python (sys.getfilesystemencoding()tells Python what codec to use).However, the Windows console uses a different encoding; in your case
gbk, and that codec cannot display all the different characters that UTF-16 can encode.You are looking for a
print()statement here. You perhaps could useprint(filename.encode('gbk', errors='replace'))to try and print the filenames instead; unprintable characters will be replaced by a question mark.Alternatively, you could use a
b'F:\\music'as the path and work with raw bytestrings instead of Unicode.