I have a Python script that read a list of path names from a

Question

0

Asked: May 27, 20262026-05-27T04:32:13+00:00 2026-05-27T04:32:13+00:00

I have a Python script that read a list of path names from a

0

I have a Python script that read a list of path names from a file and open them using the gzip module. It works well under Linux. But when I used it under Windows, I met an error when calling the gzip.open function. The error message is as follows:

File "C:\dev_tools\Python27\lib\gzip.py", line 34, in open
    return GzipFile(filename, mode, compresslevel)
File "C:\dev_tools\Python27\lib\gzip.py", line 89, in __init__
    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: file() argument 1 must be encoded string without NULL bytes, not str

The filename should be something like

‘G:\ext_pt1\cfx33_50instr4_testset\cfx33_50instr4_0-99\cfx33_50instr4_cov\cfx33_50instr4_id0_cov\cfx33_50instr4_id0.detail.rpt.gz’

But when I printed the filename, it printed out something like

‘ ■G : \ e x t _ p t 1 \ c f x 3 3 _ 5 0 i n s t r 4 _ t e s t s e t \
c f x 3 3 _ 5 0 i n s t r 4 _ 0 – 9 9 \ c f x 3 3 _ 5 0 i n s t r 4 _
c o v \ c f x 3 3 _ 5 0 i n s t r 4 _ i d 0 _ c o v \ c f x 3 3 _ 5 0
i n s t r 4 _ i d 0 . d e t a i l . r p t . g z’

And when I printed repr(filename), it printed out something like

‘\xff\xfeG\x00:\x00\\x00e\x00x\x00t\x00_\x00p\x00t\x001\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00s\x00t\x00r\x004\x00_\x00t\x00e\x00s\x00t\x00s\x00e\x00t\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00\x00t\x
00r\x004\x00_\x000\x00-\x009\x009\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00\x00t\x00r\x004\x00_\x00c\x00o\x00v\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00s\x00t\x00r\x004\x00_\x00i\x00d\x000\x00_\x00c\x00o\x00v\x00\\x00c\x00f\x00x\x003\x003\x00_\x005\x000\x00i\x00n\x00s\x00t\x00r\x004\x00_\x00i\x00d\x000\x00.\x00d\x00e\x00t\x00a\x00i\x00l\x00.\x00r\x00p\x00t\x00.\x00g\x00z\x00’

I don’t know why Python added those spaces (possibly the NULL bytes?) when it read the file. Does anyone have any clue?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T04:32:13+00:00

Python has not added anything; it has merely read what is in the file. You have a little-endian UTF-16 string there, as you can plainly tell by the byte-order mark in the first two bytes. If you are not expecting this, you could convert it to ASCII (assuming it doesn’t have any non-ASCII characters).

# convert mystring from little-endian UTF-16 with optional BOM to ASCII
mystring = unicode(mystring, encoding="utf-16le").encode("ascii", "ignore")

Or just convert it to proper Unicode and use it that way, if Windows will tolerate it:

mystring = unicode(mystring, encoding="utf-16le").lstrip(u"\ufeff")

Above, I have manually specified the byte order and then stripped off the BOM, rather than specifying “utf-16” as the encoding and letting Python figure out the byte order. This is because the BOM is going to be found once at the beginning of the file, not at the beginning of each line, so if you are converting the lines to Unicode one at a time, you won’t have a BOM most of the time.

However, it might make more sense to go back to the source of that file and figure out why it’s being saved in little-endian UTF-16 if you expected ASCII. Is the file generated the same way on Linux and Windows, for instance? Has it been touched by a text editor that defaults to saving as Unicode? Etc.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a Python script that read a list of path names from a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply