I have a CSV data set, 40 columns by 800 ish rows.
But as an example lets say its looks like this:
Ref X Y
11 1 10
11 2 9
11 3 8
11 4 7
12 5 6
12 6 5
12 7 4
13 8 3
13 9 2
How would you define a function that returns a list of the average X and Y values for each Ref? i.e to yield something like:
Ref_list = [11,12,13]
Av_X = [2.5,6,12.5]
I doubt this is the best way to approach it, but I’ve written the following code:
my_data = genfromtxt('somedata.csv', delimiter=',',skiprows=1)
X=[]
for i in my_data:
X.append(i[0])
counter=collections.Counter(X)
keys=np.sort((counter.keys())) #find and sort ref key values
def getdata():
X , Y = [], []
for i in my_data:
if i[0] == refs:
X.append(i[1])
Y.append(i[2])
AV_X=np.average(X)
AV_Y=np.average(X)
return AV_X, AV_Y
for refs in keys: # run function over key range
AV_X, AV_Y = getdata()
here i get stuck, i was trying iterate the function over the range of ref no. (keys) and append the returned values. But other than errors, i can only get the values for the last Ref. in keys.
I imagine there is a better way to do this, but i’m still a newbie to this stuff.
Many thanks in advance for any suggestions
You can use the brilliant pandas library for those kind of jobs:
As you can see in the last row, you miscalculated in your question…
You can likewise ask for median, sum, max, etc..