I have a table that looks like this:
id value
AGA 0.211
AGA 0.433
AGA 0.123
AGH 0.002
DHI 0.063
DHI 0.193
DHI 0.004
KHI 0.543
KHI 0.064
HID 0.234
For each id there are sometimes different values. I want to count how many entrances there are for each id, the average and the sum of the values for each id so the outcome would be something like this:
id cnt sum av
AGA 3 0.76 0.25
AGH 1 0.002 0.002
DHI 3 0.26 0.008
KHI 2 0.607 0.304
HID 1 0.234 0.234
I thouht it would be best to first make a dictionary where I count each entry, but got stuck after that, not knowing if it is best to have the value of the dictionary as an array (with the cnt, sum and av) and by then using the range of the Cnt to calculate, but could not think of ways to do that! This is how far I got:
idDict = {}
for line in file:
line = line.rstrip()
f = line.split()
id = f[0]
idDict[id] = idDict.get(id, 0) + 1
But if I have already created the dictionary here with the cnt, I dont know how to iterate over each id to do the sum and av calculations 🙁
Since the data in you table seems to be sorted, there is actually no need to first put everything in a dictionary, but it might make things clearer. But I guess your table could get quite big, so storing everything a second time is a resource killer…
i didn’t test that code, but you should get the idea. It looks a little complicated, but when you have >100k lines or so, you should be feeling a difference to first loading everything in memory and then working on it