I have been given some ‘reports’ from another piece of software that contains data that I need to use. The file is quite simple. It has a description line that starts with a # that is the variable name/description. Followed by comma seperated data on the next line.
eg
#wavelength,'<a comment describing the data>'
400.0,410.0,420.0, <and so on>
#reflectance,'<a comment describing the data>'
0.001,0.002,0.002, <and so on>
#date,'time file was written'
2012-03-06 13:12:36.694597 < this is the bit that stuffs me up!! >
When I first typed up some code I expected all the data to be read as floats. But I have discovered some dates and strings. For my purposes All I care about is the data that should be arrays of floats. Everything else I read in (such as dates) can be treated as a strings (even if they are technically a date for example).
My first attempt – which worked until I found non-floats – basically ignores the # then grabs the chars proceeding it making a dictionary with the Key that is the chars it just read. Then I made the entry for the key an array by splitting on the commas and stacking on rows for 2-d data. Similar to the next section of code.
data = f.readlines()
dataLines = data.split('\n')
for i in range(0,len(dataLines)-1):
if dataLines[i][0] == '#':
key,comment = dataLines[i].split(',')
keyList.append(key[1:])
k+=1
else: # it must be data
d+=1
dataList.append(dataLines[i])
for j in range(0,len(dataList)):
tmp = dataList[j]
x = map(float,tmp.split(','))
tempData = vstack((tempData,asarray(x)))
self.__report[keyList[k]] = tempData
When I find a non-float in my file the line “x = map(float,tmp.split(‘,’))” fails (there are no commas in the line of data). I thought I would try and test if it is a string or not using isinstance but the file reader treats all of the data coming in from the file as a string (of course). I tried trying to convert the line from the file to a float array, thinking if it fails then just treat it as an array of strings – like this.
try:
scipy.array(tmp,dtype=float64) #try to convert
x = map(float,tmp.split(','))
except:# ValueError: # must be a string
x = zeros((1,1))
x = asarray([tmp])
#tempData = vstack((tempData,asarray(x)),dtype=str)
if 'tempData' in locals():
pass
else:
tempData = zeros((len(x)))
tempData = vstack((tempData,asarray(x)))
This however results as EVERYTHING being read in as a character array and as such, I cannot index the data as a numpy array. All of the data is there in the dictionary but the dtype is s|8, for example. It seems the try block is going straight to the exception.
I would appreciate any advice on getting this to work so I can discriminate between floats and strings. I don’t know the order of the data before I get the report.
Also, the big files can take quite a long time to load in to memory, any advice on how to make this more efficient would also be appreciated.
Thanks
I’m assuming that finally you are interested in the
xwhich should be in the format[400.0, 410.0, 420.0].One way to handle this is separating the splitting by command and converting to float operations in two different statements, so that you can catch
ValueErrorwhen you get string elements instead offloatorint.Also notice other minor changes that I’ve done to your code which makes it more pythonic in nature.