I’m playing with NumPy and Scipy and I’m having trouble finding a feature in the documentation. I was thus wondering if anyone could help.
Suppose I have an array in NumPy with two columns and k rows. One column serves as an numerical indicator (e.g. 2 = male, 1 = female, 0 = unknown) while the second column is perhaps a list of values or scores.
Lets say that I want to find the standard deviation (could be mean or whatever, I just want to apply a function) of the values for all rows with indicator 0, and then for 1, and finally, 2.
Is there a predefined function to composite this for me?
In R, the equivalent can be found in the plyr package. Does NumPy and/or Scipy have an equivalent, or am I stuck creating a mask for this array and then somehow filtering through this mask and then applying my function?
As always, thanks for your help!
If I understand your description, you have a dataset something like this:
In this situation
numpy.uniquecan be used to generate an array of unique “key” values:and those values used to drive a generator expression like this:
The generator
gwill emit the standard deviation of all the entries indwhich match each entry in the index.numpy.fromiteratorcan then be used to collect the results:Note there is conversion of the keys to floating point in the last step during stacking, you might not want that depending on your data, but I did just it for illustrative purposes to have a “nice” looking final result to post.