Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7488095
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T14:44:13+00:00 2026-05-29T14:44:13+00:00

I thought I knew everything about encodings and Python, but today I came across

  • 0

I thought I knew everything about encodings and Python, but today I came across a weird problem: although the console is set to code page 850 – and Python reports it correctly – parameters I put on the command line seem to be encoded in code page 1252. If I try to decode them with sys.stdin.encoding, I get the wrong result. If I assume ‘cp1252’, ignoring what sys.stdout.encoding reports, it works.

Am I missing something, or is this a bug in Python ? Windows ? Note: I am running Python 2.6.6 on Windows 7 EN, locale set to French (Switzerland).

In the test program below, I check that literals are correctly interpreted and can be printed – this works. But all values I pass on the command line seem to be encoded wrongly:

#!/usr/bin/python
# -*- encoding: utf-8 -*-
import sys

literal_mb = 'utf-8 literal:   üèéÃÂç€ÈÚ'
literal_u = u'unicode literal: üèéÃÂç€ÈÚ'
print "Testing literals"
print literal_mb.decode('utf-8').encode(sys.stdout.encoding,'replace')
print literal_u.encode(sys.stdout.encoding,'replace')

print "Testing arguments ( stdin/out encodings:",sys.stdin.encoding,"/",sys.stdout.encoding,")"
for i in range(1,len(sys.argv)):
    arg = sys.argv[i]
    print "arg",i,":",arg
    for ch in arg:
        print "  ",ch,"->",ord(ch),
        if ord(ch)>=128 and sys.stdin.encoding == 'cp850':
            print "<-",ch.decode('cp1252').encode(sys.stdout.encoding,'replace'),"[assuming input was actually cp1252 ]"
        else:
            print ""

In a newly created console, when running

C:\dev>test-encoding.py abcé€

I get the following output

Testing literals
utf-8 literal:   üèéÃÂç?ÈÚ
unicode literal: üèéÃÂç?ÈÚ
Testing arguments ( stdin/out encodings: cp850 / cp850 )
arg 1 : abcÚÇ
   a -> 97
   b -> 98
   c -> 99
   Ú -> 233 <- é [assuming input was actually cp1252 ]
   Ç -> 128 <- ? [assuming input was actually cp1252 ]

while I would expect the 4th character to have an ordinal value of 130 instead of 233 (see the code pages 850 and 1252).

Notes: the value of 128 for the euro symbol is a mystery – since cp850 does not have it. Otherwise, the ‘?’ are expected – cp850 cannot print the characters and I have used ‘replace’ in the conversions.

If I change the code page of the console to 1252 by issuing chcp 1252 and run the same command, I (correctly) obtain

Testing literals
utf-8 literal:   üèéÃÂç€ÈÚ
unicode literal: üèéÃÂç€ÈÚ
Testing arguments ( stdin/out encodings: cp1252 / cp1252 )
arg 1 : abcé€
   a -> 97
   b -> 98
   c -> 99
   é -> 233
   € -> 128

Any ideas what I’m missing ?

Edit 1: I’ve just tested by reading sys.stdin. This works as expected: in cp850, typing ‘é’ results in an ordinal value of 130. So the problem is really for the command line only. So, is the command line treated differently than the standard input ?

Edit 2: It seems I had the wrong keywords. I found another very close topic on SO: Read Unicode characters from command-line arguments in Python 2.x on Windows. Still, if the command line is not encoded like sys.stdin, and since sys.getdefaultencoding() reports ‘ascii’, it seems there is no way to know its actual encoding. I find the answer using win32 extensions pretty hacky.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T14:44:16+00:00Added an answer on May 29, 2026 at 2:44 pm

    Replying to myself:

    On Windows, the encoding used by the console (thus, that of sys.stdin/out) differs from the encoding of various OS-provided strings – obtained through e.g. os.getenv(), sys.argv, and certainly many more.

    The encoding provided by sys.getdefaultencoding() is really that – a default, chosen by Python developers to match the “most reasonable encoding” the interpreter use in extreme cases. I get ‘ascii’ on my Python 2.6, and tried with portable Python 3.1, which yields ‘utf-8’. Both are not what we are looking for – they are merely fallbacks for encoding conversion functions.

    As this page seems to state, the encoding used by OS-provided strings is governed by the Active Code Page (ACP). Since Python does not have a native function to retrieve it, I had to use ctypes:

    from ctypes import cdll
    os_encoding = 'cp' + str(cdll.kernel32.GetACP())
    

    Edit: But as Jacek suggests, there actually is a more robust and Pythonic way to do it (semantics would need validation, but until proven wrong, I’ll use this)

    import locale
    os_encoding = locale.getpreferredencoding()
    # This returns 'cp1252' on my system, yay!
    

    and then

    u_argv = [x.decode(os_encoding) for x in sys.argv]
    u_env = os.getenv('myvar').decode(os_encoding)
    

    On my system, os_encoding = 'cp1252', so it works. I am quite certain this would break on other platforms, so feel free to edit and make it more generic. We would certainly need some kind of translation table between the ACP reported by Windows and the Python encoding name – something better than just prepending ‘cp’.

    This is a unfortunately a hack, although I find it a bit less intrusive than the one suggested by this ActiveState Code Recipe (linked to by the SO question mentioned in Edit 2 of my question). The advantage I see here is that this can be applied to os.getenv(), and not only to sys.argv.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a problem that I thought I knew how I can fix, but
I thought I knew everything about UDTs and JDBC until someone on SO pointed
I thought I knew this, but today I'm being proven wrong - again. Running
So.. I thought I knew mssql fairly well but this one query asked of
I thought I knew how to declare javascript arrays but in this script I
I thought I knew how to deal with memory leaks and arrays, but then
I thought that I knew how to use fast enumeration, but there is something
A large amount of what I thought I knew about REST is apparently wrong
I thought I knew this, but am confused :( Need some clarification on it
I thought I knew this already but now I'm not sure: Is all content

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.