I have the following file:
abcde kwakwa <0x1A> line3 linllll
Where <0x1A> represents a byte with the hex value of 0x1A. When attempting to read this file in Python as:
for line in open('t.txt'): print line,
It only reads the first two lines, and exits the loop.
The solution seems to be to open the file in binary (or universal newline mode) – ‘rb’ or ‘rU’. Can you explain this behavior ?
0x1A is Ctrl-Z, and DOS historically used that as an end-of-file marker. For example, try using a command prompt, and ‘type’ing your file. It will only display the content up the Ctrl-Z.
Python uses the Windows CRT function _wfopen, which implements the ‘Ctrl-Z is EOF’ semantics.