Ubuntu's repositories usually contain up-to-date versions of Python modules (prefixed…

Question

0

Editorial Team

Asked: May 15, 20262026-05-15T16:06:03+00:00 2026-05-15T16:06:03+00:00

I am getting the very familiar: UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xe8′ in

0

I am getting the very familiar:

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\xe8′ in position 24: ordinal not in range(128)

I have checked out multiple posts on SO and they recommend – variable.encode(‘ascii’, ‘ignore’)

however, this is not working. Even after this I am getting the same error …

The stack trace:

'ascii' codec can't encode character u'\x92' in position 18: ordinal not in range(128)
Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 513, in __call__
    handler.post(*groups)
  File "/base/data/home/apps/autominer1/1.343038273644030157/siteinfo.py", line 2160, in post
    imageAltTags.append(str(image["alt"]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in position 18: ordinal not in range(128)

The code responsible for the same:

siteUrl = urlfetch.fetch("http://www."+domainName, headers = { 'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b5) Gecko/2008032620 Firefox/3.0b5' } )


 webPage = siteUrl.content.decode('utf-8', 'replace').encode('ascii', 'replace')


 htmlDom = BeautifulSoup(webPage)

 imageTags = htmlDom.findAll('img', { 'alt' : True } )


 for image in imageTags :
                        if len(image["alt"]) > 3 :
                                imageAltTags.append(str(image["alt"]))

Any help would be greatly appreciated. thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T16:06:03+00:00

There are two different things that Python treats as strings – ‘raw’ strings and ‘unicode’ strings. Only the latter actually represent text. If you have a raw string, and you want to treat it as text, you first need to convert it to a unicode string. To do this, you need to know the encoding for the string – they way unicode codepoints are represented as bytes in the raw string – and call .decode(encoding) on the raw string.

When you call str() on a unicode string, the opposite transformation takes place – Python encodes the unicode string as bytes. If you don’t specify a character set, it defaults to ascii, which is only capable of representing the first 128 codepoints.

Instead, you should do one of two things:

Represent ‘imageAltTags’ as a list of unicode strings, and thus dump the str() call – this is probably the best approach
Instead of str(x), call x.encode(encoding). The encoding to use will depend on what you’re doing, but the most likely choice is utf-8 – eg, x.encode(‘utf-8’).

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions