I’m new to Python and have what is probably a very basic question about the ‘best’ way to store data in my code. Any advice much appreciated!
I have a long .csv file in the following format:
Scenario,Year,Month,Value
1,1961,1,0.5
1,1961,2,0.7
1,1961,3,0.2
etc.
My scenario values run from 1 to 100, year goes from 1961 to 1990 and month goes from 1 to 12. My file therefore has 100*29*12 = 34800 rows, each with an associated value.
I’d like to read this file into some kind of Python data structure so that I can access a ‘Value’ by specifying the ‘Scenario’, ‘Year’ and ‘Month’. What’s the best way to do this please (or what are the various options)?
In my head I think of this data as a kind of ‘number cuboid’ with axes for Scenario, Year and Month, so that each Value is located at co-ordinates (Scenario, Year, Month). For this reason, I’m tempted to try to read these values into a 3D numpy array and use Scenario, Year and Month as indices. Is this a sensible thing to do?
I guess I could also make a dictionary where the keys are something like
str(Scenario)+str(Year)+str(Month)
Would this be better? Are there other options?
(By ‘better’ I suppose I mean ‘faster to access’, although if one method is much less memory intensive than another it’d be good to know about that too).
Thanks very much!
I’d use a dict of tuples. Simple, fast, and a hash-table look-up to retrieve a single value: