I am converting to python and numpy from IDL (kinda like Matlab). This is kinda an open question about handling data. Maybe someone can help.
The usual situation with my data is that I have a fixed class of data, perhaps from a spreadsheet, database etc. I am trying to figure out what kind of data structures are best to use in python and numpy.
I know about the csv module and can use csv.DictReader() to read a spreadsheet. This reads line by line and makes a dictionary with the proper names from the spreadsheet header (first line).
f=open(file,'rU')
dat = csv.DictReader(f)
i=0
data=[] # makes an empty list
i=0
for row in dat:
data.append(row)
if i == 0 :
keys=row.keys()
print "keys"
print keys
print
i=i+1
f.close()
First of all, that is kinda a lot of code to read a csv file into a list of dictionaries and key the keys. Is there a faster/better way?
But now, I wonder whether an array of dictionaries is really what I want. Should I make a class of objects and make this an array of objects? Or something else?
If I have my array of dictionaries, “data”, I would get some “column” like
age=array([dat[“age”] for dat in data])
Is that the right way to do it? Is there no way like “age=data->age” that would do it faster?
Would appreciate some guidance. Thanks.
Doing it the way you are is OK, though your code can easily be made more concise: