(The following is using Python 2.6.1)
I have 2 strings:
>>> a = u'\u05e8\u05db\u05e1'
>>> b = u'\u05e8\u05db\u05e1 \u05d4\u05d9\u05d0 \u05de\u05d0\u05d9\u05e8\u05d4 \u05d1\u05e4\u05e0\u05e1'
I encode them:
>>> ua = a.encode('utf-8')
>>> ub = b.encode('utf-8')
>>> ua
'\xd7\xa8\xd7\x9b\xd7\xa1'
>>> ub
'\xd7\xa8\xd7\x9b\xd7\xa1 \xd7\x94\xd7\x99\xd7\x90 \xd7\x9e\xd7\x90\xd7\x99\xd7\xa8\xd7\x94 \xd7\x91\xd7\xa4\xd7\xa0\xd7\xa1'
and try to print:
>>> print ua
רכס
>>> print ub
רכס היא מאירה בפנס
Why does ub print in Hebrew characters while ua doesn’t? ua is just the first few characters of ub, so it seems as though string length is somehow the problem, which is weird.
(For the record, this came up trying to parse a webpage with BeautifulSoup — I couldn’t tell why some paragraphs came out readably while others didn’t.)
Must be something with your terminal settings;
uaprints three Hebrew characters on my terminal (Terminal.app on OS X), exactly the rightmost three characters ofub. (Since Hebrew is a right-to-left script, the rightmost three characters are the first three).For the record, I’ve tried it with Python 2.6.1.