I am using python with numpy to read in data from a numerical model in a text file with a fairly complicated format.
Numpy’s genfromtxt and fromfile functions work well, but only if the data is structured. My data files looks something like this:
——snip
[sitename] [dimemsion 1 size] [dimension 2 size]
[data for dim 1]
[data for dim 2]
[date/time]
[header data]
[data (dim1 * dim2)]
[header]
[data]
...
.
.
[data/time]
[header]
[data]
.
.
etc...
—- snip
So, I have a mixture of text and numbers and a complicated (but repeating) layout. How is the best way to read this in using numpy?
Cheers,
Chris
Numpy isn’t good at generalized parsing, so you’d do well to look beyond it, and what you choose will depend mostly on how consistent the files are.
If they’re unusually ultra consistent, so that say, you can just extract numbers from known positions and known rows, than you can just read in the file line by line as a sting and index this to the character that you want. (Step through the file, e.g., using file.readlines to get each line as a string.)
The usual case (at least that I find) is that it’s more varied than above, but that simple string operations can be used to parse the line, such as string.split (which is almost always my first step), etc.
Beyond this, there are lots of parsing libraries in Python. I’m partial to pyparsing (but I don’t know the others well, so it’s not a fair comparison). Here’s a summary of the various parsing libraries.