Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6107147
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T14:09:25+00:00 2026-05-23T14:09:25+00:00

I parsed a file and saved its content in a database using Django. The

  • 0

I parsed a file and saved its content in a database using Django. The website was 100% in English, so I naively assumed it would be ASCII all along, and saved the text happily as unicode.

You guess the rest of the story 🙂

When I print, I get the usual encoding error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 48: ordinal not in range(128)

A quick search tells me that u’\u2019′ is the UTF-8 representation of ’.

repr(string) displays me this:

"u'his son\\u2019s friend'"

Then of course I tried django.utils.encoding.smart_str and a more direct approach using string.encode(‘utf-8’), and I ended up with something printable. Unfortunatly, it prints like this in my (linux UTF-8) terminal:

In [76]: repr(string.encode('utf-8'))
Out[76]: "'his son\\xe2\\x80\\x99s friend '"

In [77]: print string.encode('utf-8')
his son�s friend

Not what I expected. I suspect I double encoded something or missed an important point.

Of course the file original encoding is not pusblished with the file. I guess I could read the HTTP headers or ask the webmaster but since \u2019s looks like UTF-8, I assumed it was utf-8. I can be very wrong, tell me if I am.

Solutions obviously appreciated, but a deep explanation on the cause and what to do to avoid this to happen again would be even more. I often get bitten with encoding, which shows that I still don’t master completly the subject.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T14:09:26+00:00Added an answer on May 23, 2026 at 2:09 pm

    You are fine. You have the proper data. Yes, the original data is UTF-8 (based on context u2019 makes perfect sense as an apostrophe between “son” and “s”). The weird ? error character probably just means your terminal configuration’s font doesn’t have a glyph for this character (fancy apostrophe). No big deal. The data will be correct where it counts. If you are nervous, try some different terminal/OS combinations (I’m on OS X using iTerm). I spent a lot of time explaining to my QA guys that the scary ? question mark character just means they don’t have a Chinese font installed on their windows box (In my case we were testing with Chinese data). Here’s some comments

    #Create a Python Unicode object
    #(abstract code points, independent of any encoding)
    #single backslash tells python we want to represent
    #a code point by its unicode code point number, typed out with ASCII numbers
    >>> s1 = u'his son\u2019s friend'
    
    #If you just type it at the prompt,
    #the interpreter does the equivalent of `print repr(s1)`
    #and since repr means "show it like a string typed into a python source file",
    #you get your ASCII escaped version back
    >>> s1
    u'his son\u2019s friend'
    >>> print repr(s1)
    u'his son\u2019s friend'
    
    #This isn't ASCII, so encoding into ASCII generates your original
    #error as expected
    >>> s1.encode('ascii')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode character
     u'\u2019' in position 7: 
    ordinal not in range(128)
    
    # Encode in UTF-8 and now we have a string,
    # which gets displayed as hex escapes.     
    #Unicode code point 2019 looks like it gets 3 bytes in UTF-8 (yup, it does)
    >>> s1.encode('utf-8')
    'his son\xe2\x80\x99s friend'
    
    #My terminal DOES have a different glyph (symbol) to use here,
    #so it displays OK for me.
    #Note that my terminal has a different glyph for a normal ASCII apostrophe
    #(straight vertical)
    >>> print s1
    his son’s friend
    >>> repr(s1)
    "u'his son\\u2019s friend'"
    >>> str(s1.encode('utf-8'))
    'his son\xe2\x80\x99s friend'
    

    See also: http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html

    See also for character 2019 (e28099 in hex, search for “2019” on this page): http://www.utf8-chartable.de/unicode-utf8-table.pl?start=8000

    See also: http://www.joelonsoftware.com/articles/Unicode.html

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm using a propriatery Java library that saves its data directly into a java.io.File
I have parsed XML file into objects, in which each object has a 1:1
I'm trying to parse a file from the web on Android using the DOM
I save stuff in an Isolated Storage file (using class IsolatedStorageFile). It works well,
Want to upload a file using ajax for this using this uploader http://valums.com/ajax-upload/ and
I am using Jquery ajax to Upload files to my database to produce a
I do the following: myAppDelegate file I'm parsing an xml file and set its
I'm trying to download CSV content from morningstar and then parse its contents. If
I am using GWT and want to parse an xml file, but I get
I want to be able to parse file paths like this one: /var/www/index.(htm|html|php|shtml) into

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.