According to the documentation, in Python 2.7.3, shlex should support UNICODE. However, when running

Question

0

Asked: June 17, 20262026-06-17T03:27:15+00:00 2026-06-17T03:27:15+00:00

According to the documentation, in Python 2.7.3, shlex should support UNICODE. However, when running

0

According to the documentation, in Python 2.7.3, shlex should support UNICODE. However, when running the code below, I get: UnicodeEncodeError: 'ascii' codec can't encode characters in position 184-189: ordinal not in range(128)

Am I doing something wrong?

import shlex

command_full = u'software.py -fileA="sequence.fasta" -fileB="新建文本文档.fasta.txt" -output_dir="..." -FORMtitle="tst"'

shlex.split(command_full)

The exact error is following:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shlex.py", line 275, in split
    lex = shlex(s, posix=posix)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shlex.py", line 25, in __init__
    instream = StringIO(instream)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 44-49: ordinal not in range(128)

This is output from my mac using python from macports. I am getting exactly the same error on Ubuntu machine with “native” python 2.7.3.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T03:27:16+00:00

The shlex.split() code wraps both unicode() and str() instances in a StringIO() object, which can only handle Latin-1 bytes (so not the full unicode codepoint range).

You’ll have to encode (to UTF-8 should work) if you still want to use shlex.split(); the maintainers of the module meant that unicode() objects are supported now, just not anything outside the Latin-1 range of codepoints.

Encoding, splitting, decoding gives me:

>>> map(lambda s: s.decode('UTF8'), shlex.split(command_full.encode('utf8')))
[u'software.py', u'-fileA=sequence.fasta', u'-fileB=\u65b0\u5efa\u6587\u672c\u6587\u6863.fasta.txt', u'-output_dir=...', u'-FORMtitle=tst']

A now closed Python issue tried to address this, but the module is very byte-stream oriented, and no new patch has materialized. For now using iso-8859-1 or UTF-8 encoding is the best I can come up with for you.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

According to the documentation, in Python 2.7.3, shlex should support UNICODE. However, when running

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply