I am trying to read some numeric data from a text file but am struggling to read numbers stored without any deliminators. The file format itself is a fairly standard format used in numerous codes around the world and so cannot be changed. The following is a snippet of the head of an example file:
SOME TEXT OF A FIXED LENGTH 33
3.192839854E+00 3.189751983E+00 3.186795271E+00 3.183874776E+00 3.180986976E+00
3.178133610E+00 3.175318116E+00 3.172544681E+00 3.169818171E+00 3.167143271E+00
3.164524875E+00 3.161968464E+00 3.159479193E+00 3.157062171E+00 3.154723040E+00
3.152466964E+00 3.150299067E+00 3.148224863E+00 3.146249721E+00 3.144379226E+00
3.142619004E+00 3.140974218E+00 3.139450283E+00 3.138052814E+00 3.136786929E+00
3.135657986E+00 3.134671499E+00 3.133833067E+00 3.133149899E+00 3.132631559E+00
3.132282773E+00 3.132080343E+00 3.131954939E+00
-5.487648393E-01-5.476736110E-01-5.447693831E-01-5.405765060E-01-5.353610408E-01
-5.291415409E-01-5.219573970E-01-5.137449740E-01-5.045337620E-01-4.943949468E-01
-4.832213992E-01-4.710109577E-01-4.578747780E-01-4.436967869E-01-4.285062978E-01
-4.123986122E-01-3.952894227E-01-3.771859951E-01-3.580934057E-01-3.379503384E-01
-3.168282028E-01-2.947799605E-01-2.716835737E-01-2.476267515E-01-2.226373818E-01
-1.966313850E-01-1.696421504E-01-1.415353640E-01-1.118510940E-01-8.041086734E-02
-4.968321601E-02-2.772555484E-02-2.631111359E-02
....
The first line contains some comments (of a fixed length) followed by an integer which gives the length of arrays which follow. The arrays themselves are stored as a list of numbers of fixed width. In this case the first array shouldn’t cause me any problems. However, as you can see from the second array, all the numbers are negative and thus there are no spaces between the numbers. Therefore, methods such as str.split() will not return a list of numbers. I would be grateful for any suggestions about how best to process this file.
One final bit of information which may be important: the arrays themselves contain newline characters, i.e. the following code
with open('some_file') as fh:
data = [line for line in fh]
npts = int(data.pop(0).split()[-1])
print data
returns:
[' 3.192839854E+00 3.189751983E+00 3.186795271E+00 3.183874776E+00 3.180986976E+00\n',
' 3.178133610E+00 3.175318116E+00 3.172544681E+00 3.169818171E+00 3.167143271E+00\n',
' 3.164524875E+00 3.161968464E+00 3.159479193E+00 3.157062171E+00 3.154723040E+00\n',
' 3.152466964E+00 3.150299067E+00 3.148224863E+00 3.146249721E+00 3.144379226E+00\n',
' 3.142619004E+00 3.140974218E+00 3.139450283E+00 3.138052814E+00 3.136786929E+00\n',
' 3.135657986E+00 3.134671499E+00 3.133833067E+00 3.133149899E+00 3.132631559E+00\n',
' 3.132282773E+00 3.132080343E+00 3.131954939E+00\n',
'-5.487648393E-01-5.476736110E-01-5.447693831E-01-5.405765060E-01-5.353610408E-01\n',
'-5.291415409E-01-5.219573970E-01-5.137449740E-01-5.045337620E-01-4.943949468E-01\n',
'-4.832213992E-01-4.710109577E-01-4.578747780E-01-4.436967869E-01-4.285062978E-01\n',
'-4.123986122E-01-3.952894227E-01-3.771859951E-01-3.580934057E-01-3.379503384E-01\n',
'-3.168282028E-01-2.947799605E-01-2.716835737E-01-2.476267515E-01-2.226373818E-01\n',
'-1.966313850E-01-1.696421504E-01-1.415353640E-01-1.118510940E-01-8.041086734E-02\n',
'-4.968321601E-02-2.772555484E-02-2.631111359E-02\n', ... ]
Hopefully this is relatively clear – let me know if you require more information about the file format.
Since each entry is exactly sixteen characters in width, the following will convert one line of your input file into an list of floats:
Here, I assume that the line does not contain a trailing newline; if it might,
str.rstripcan be used to remove it first. The following code snippet also demonstrates how to split the sequence of numbers into chunks ofn(note that it doesn’t attempt to parse the header line):