If a python script uses the open(filename, r) function to open, and subsequently read,

Question

0

Asked: May 21, 20262026-05-21T20:15:35+00:00 2026-05-21T20:15:35+00:00

If a python script uses the open(filename, r) function to open, and subsequently read,

0

If a python script uses the open("filename", "r") function to open, and subsequently read, the contents of a text file, how can I tell which encoding this file is supposed to have?

Note that since I’m executing this script from my own program, if there is any way to control this through environment variables, then that is good enough for me.

This is Python 2.7 by the way.

The code in question comes from Mercurial, it can be given a list of files to, say, add to the repository, through a file on disk, instead of passing them on the command line.

So basically, instead of this:

hg add A B C

I can write out A, B and C to a file, with newlines between each, and then execute the following:

hg add listfile:input.txt

The code that ends up reading this file is this:

files = open(name, 'r').read().split(delimiter)

Hence my question. The answer I was given on IRC when I asked which encoding I should use was this:

it is the same encoding than the one you use on command line when passing a file argument

I take this to mean that it is the same encoding I “use” when I execute Mercurial (hg). Since I have no idea which encoding that is, I just give everything to the .NET Process object, I ask here.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T20:15:36+00:00

You can’t. Reading a file is independent of its encoding; you’ll need to know the encoding in advance in order to properly interpret the bytes you read in.

For example, if you know the file is encoded in UTF-8:

with open('filename', 'rb') as f:
    contents = f.read().decode('utf-8-sig')    # -sig deals with BOM, if present

Or if you know the file is ASCII only:

with open('filename', 'r') as f:
    contents = f.read()    # results in a str object

If you really don’t know the encoding of the file, then there’s obviously no guarantee that you can read it properly; however, you can guess at the encoding using a tool like chardet.

UPDATE:

I think I understand your question now. I thought you had a file you needed to write code for, but it seems you have code you need to write a file for 😉

The code in question probably only deals properly with plain ASCII (it’s possible the strings are converted later, but unlikely I think). So you’ll want to make a text file that contains only ASCII (codepoint < 128) characters, and make sure it is saved in an ASCII encoding (i.e. not UTF-16 or anything like that). This is a little unfortunate considering that Mercurial deals with filenames, which can contain Unicode characters.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

If a python script uses the open(filename, r) function to open, and subsequently read,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply