According to the documentation, in Python 2.7.3, shlex should support UNICODE. However, when running the code below, I get: UnicodeEncodeError: 'ascii' codec can't encode characters in position 184-189: ordinal not in range(128)
Am I doing something wrong?
import shlex
command_full = u'software.py -fileA="sequence.fasta" -fileB="新建文本文档.fasta.txt" -output_dir="..." -FORMtitle="tst"'
shlex.split(command_full)
The exact error is following:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shlex.py", line 275, in split
lex = shlex(s, posix=posix)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shlex.py", line 25, in __init__
instream = StringIO(instream)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 44-49: ordinal not in range(128)
This is output from my mac using python from macports. I am getting exactly the same error on Ubuntu machine with “native” python 2.7.3.
The
shlex.split()code wraps bothunicode()andstr()instances in aStringIO()object, which can only handle Latin-1 bytes (so not the full unicode codepoint range).You’ll have to encode (to UTF-8 should work) if you still want to use
shlex.split(); the maintainers of the module meant thatunicode()objects are supported now, just not anything outside the Latin-1 range of codepoints.Encoding, splitting, decoding gives me:
A now closed Python issue tried to address this, but the module is very byte-stream oriented, and no new patch has materialized. For now using
iso-8859-1orUTF-8encoding is the best I can come up with for you.