Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6737959
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T11:18:38+00:00 2026-05-26T11:18:38+00:00

EDIT: The following print shows my intended value. (both sys.stdout.encoding and sys.stdin.encoding are ‘UTF-8’).

  • 0

EDIT:

The following print shows my intended value.

(both sys.stdout.encoding and sys.stdin.encoding are ‘UTF-8’).

Why is the variable value different than its print value? I need to get the raw value into a variable.

>>username = 'Jo\xc3\xa3o'
>>username.decode('utf-8').encode('latin-1')
'Jo\xe3o'
>>print username.decode('utf-8').encode('latin-1')
João

Original question:

I’m having an issue querying a BD and decoding the values into Python.

I confirmed my DB NLS_LANG using

select property_value from database_properties where property_name='NLS_CHARACTERSET';

'''AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines 
UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate
characters encoded using UTF-8 (or six bytes per character)'''

os.environ["NLS_LANG"] = ".AL32UTF8"

....
conn_data = str('%s/%s@%s') % (db_usr, db_pwd, db_sid)

sql = "select user_name apex.users where user_id = '%s'" % userid

...

cursor.execute(sql)
ldap_username = cursor.fetchone()
...

where

print ldap_username
>>'Jo\xc3\xa3o'

I’ve both tried (which return the same)

ldap_username.decode('utf-8')
>>u'Jo\xe3o'
unicode(ldap_username, 'utf-8')
>>u'Jo\xe3o'

where

u'João'.encode('utf-8')
>>'Jo\xc3\xa3o'

how to get the queries result back to the proper ‘João’ ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T11:18:39+00:00Added an answer on May 26, 2026 at 11:18 am

    You already have the proper ‘João’, methinks. The difference between >>> 'Jo\xc3\xa3o' and >>> print 'Jo\xc3\xa3o' is that the former calls repr on the object, while the latter calls str (or probably unicode, in your case). It’s just how the string is represented.

    Some examples might make this more clear:

    >>> print 'Jo\xc3\xa3o'.decode('utf-8')
    João
    >>> 'Jo\xc3\xa3o'.decode('utf-8')
    u'Jo\xe3o'
    >>> print repr('Jo\xc3\xa3o'.decode('utf-8'))
    u'Jo\xe3o'
    

    Notice how the second and third result are identical. The original ldap_username currently is an ASCII string. You can see this on the Python prompt: when it is displaying an ACSII object, it shows as 'ASCII string', while Unicode objects are shown as u'Unicode string' — the key being the leading u.

    So, as your ldap_username reads as 'Jo\xc3\xa3o', and is an ASCII string, the following applies:

    >>> 'Jo\xc3\xa3o'.decode('utf-8')
    u'Jo\xe3o'
    >>> print 'Jo\xc3\xa3o'.decode('utf-8') # To Unicode...
    João
    >>> u'João'.encode('utf-8')             # ... back to ASCII
    'Jo\xc3\xa3o'
    

    Summed up: you need to determine the type of the string (use type when unsure), and based on that, decode to Unicode, or encode to ASCII.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

EDIT: minor fixes (virtual Print; return mpInstance) following remarks in the answers. I am
I am using the following code --- EDIT @annakata : Adding the complete code
Im trying to hammer togehter a WYSIWYG-edit in c# following some examples from here
Edit: Warning - I now realize that the following technique is generally regarded as
EDIT: I am basically running into the following documented issue . I am using
[Edit] I've summarized the answer to the following below, the error lies in the
EDIT: OK, I believe the following solutions are valid: Use the jQuery AOP plugin.
I am getting the following error when an event (Add/Edit/Delete) occurs on my databound
Edit: closing anchor fixed. This issue exists when testing on the following browsers: Google
I have the following code [in doPost()] to edit existing record. It does not

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.