I have been given some ‘reports’ from another piece of software that contains data

Question

0

Asked: May 30, 20262026-05-30T23:41:59+00:00 2026-05-30T23:41:59+00:00

I have been given some ‘reports’ from another piece of software that contains data

0

I have been given some ‘reports’ from another piece of software that contains data that I need to use. The file is quite simple. It has a description line that starts with a # that is the variable name/description. Followed by comma seperated data on the next line.

eg

    #wavelength,'<a comment describing the data>'
    400.0,410.0,420.0, <and so on>
    #reflectance,'<a comment describing the data>'
    0.001,0.002,0.002, <and so on>
    #date,'time file was written'
    2012-03-06 13:12:36.694597  < this is the bit that stuffs me up!! >

When I first typed up some code I expected all the data to be read as floats. But I have discovered some dates and strings. For my purposes All I care about is the data that should be arrays of floats. Everything else I read in (such as dates) can be treated as a strings (even if they are technically a date for example).

My first attempt – which worked until I found non-floats – basically ignores the # then grabs the chars proceeding it making a dictionary with the Key that is the chars it just read. Then I made the entry for the key an array by splitting on the commas and stacking on rows for 2-d data. Similar to the next section of code.

    data = f.readlines()
    dataLines = data.split('\n')

    for i in range(0,len(dataLines)-1):
        if dataLines[i][0] == '#':
            key,comment = dataLines[i].split(',')
            keyList.append(key[1:])
            k+=1
        else: # it must be data
            d+=1
            dataList.append(dataLines[i])

        for j in range(0,len(dataList)):
            tmp = dataList[j]

            x = map(float,tmp.split(','))
            tempData = vstack((tempData,asarray(x)))

    self.__report[keyList[k]] = tempData

When I find a non-float in my file the line “x = map(float,tmp.split(‘,’))” fails (there are no commas in the line of data). I thought I would try and test if it is a string or not using isinstance but the file reader treats all of the data coming in from the file as a string (of course). I tried trying to convert the line from the file to a float array, thinking if it fails then just treat it as an array of strings – like this.

     try:
         scipy.array(tmp,dtype=float64)  #try to convert
         x = map(float,tmp.split(','))

     except:# ValueError: # must be a string
         x = zeros((1,1))
         x = asarray([tmp])
         #tempData = vstack((tempData,asarray(x)),dtype=str)
         if 'tempData' in locals():
             pass
         else:
             tempData = zeros((len(x)))

         tempData = vstack((tempData,asarray(x)))

This however results as EVERYTHING being read in as a character array and as such, I cannot index the data as a numpy array. All of the data is there in the dictionary but the dtype is s|8, for example. It seems the try block is going straight to the exception.

I would appreciate any advice on getting this to work so I can discriminate between floats and strings. I don’t know the order of the data before I get the report.

Also, the big files can take quite a long time to load in to memory, any advice on how to make this more efficient would also be appreciated.

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T23:42:01+00:00

I’m assuming that finally you are interested in the x which should be in the format [400.0, 410.0, 420.0].

One way to handle this is separating the splitting by command and converting to float operations in two different statements, so that you can catch ValueError when you get string elements instead of float or int.

keyList = []
dataList = []
with open('sample_data','r') as f:
    for line in f.readline():
        if line.startswith("#"):
            key, comment = line.split(',')
            keyList.append(key[1:])
        else: # it must be data
            dataList.append(line)

for data in dataList:
    data_list = data.split(',')
    try:
        x = map(float, data_list)
    except ValueError:
        pass

Also notice other minor changes that I’ve done to your code which makes it more pythonic in nature.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have been given some ‘reports’ from another piece of software that contains data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply