Following this thread solution, I have managed to get a bunch of lists that

Question

0

Editorial Team

Asked: May 25, 20262026-05-25T03:13:46+00:00 2026-05-25T03:13:46+00:00

Following this thread solution, I have managed to get a bunch of lists that

0

Following this thread solution, I have managed to get a bunch of lists that each looks like:

[u’\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9′]

I assume that those are unicode character but for some reason, I can’t convert them back into Hebrew.

I tried the suggested solution in the comments in the link. I also tried to use ''.join but it didn’t work. The error I get is:

Error Type: exceptions.UnicodeEncodeError 22:42:15 T:2806414192
M:2425589760 ERROR: Error Contents: ‘ascii’ codec can’t encode
characters in position 0-4: ordinal not in range(128)

I tried to wrap stuff in unicode() but all I got is the same as the example above.

How do I achieve that?

Note:
I am trying to parse this link.

Edit:
I am trying to convert the list into string using join and then print it. Here is the relevant pice of code:

soup = BeautifulStoneSoup(link, convertEntities=BeautifulStoneSoup.XML_ENTITIES)
    programs = soup('ul')
    for i,prog in enumerate(programs):
        if i==(4+getLetterValue(name)):
            j = 0
            while j < len(prog('li')):
                li = prog('li')[j]
        link = li('a')[0]
        url = link['href']
                text = link.contents
                print ''.join(text)

link is a string. and getLetterValue(name) returns an integer which tells what is the position in the html document.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T03:13:47+00:00

This is a unicode string, it is in Hebrew and you can even print it directly on a Python interactive shell. e.g.:

>>> print u'\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9'
תאמין לי

If you really need to convert it to a raw string of bytes (a str object) for some reason, you have to specify the encoding of the byte string because text can represented in many different encodings.

Short answer: assuming you want to use UTF-8 to encode the text, you can use:

your_unicode_text.encode('utf-8')

If you are going to use a different encoding, just change the encoding name above.

For a reference on how Python deals with Unicode text and common problems, see: http://docs.python.org/howto/unicode.html

See also this answer for another short explanation of Unicode and string encodings.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Following this thread solution, I have managed to get a bunch of lists that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply