I’m trying to build a function that returns the products of subsets of array elements. Basically I want to build a prod_by_group function that does this:
values = np.array([1, 2, 3, 4, 5, 6])
groups = np.array([1, 1, 1, 2, 3, 3])
Vprods = prod_by_group(values, groups)
And the resulting Vprods should be:
Vprods
array([6, 4, 30])
There’s a great answer here for sums of elements that I think it should be similar to:
https://stackoverflow.com/a/4387453/1085691
I tried taking the log first, then sum_by_group, then exp, but ran into numerical issues.
There are some other similar answers here for min and max of elements by group:
https://stackoverflow.com/a/8623168/1085691
Edit: Thanks for the quick answers! I’m trying them out. I should add that I want it to be as fast as possible (that’s the reason I’m trying to get it in numpy in some vectorized way, like the examples I gave).
Edit: I evaluated all the answers given so far, and the best one is given by @seberg below. Here’s the full function that I ended up using:
def prod_by_group(values, groups):
order = np.argsort(groups)
groups = groups[order]
values = values[order]
group_changes = np.concatenate(([0], np.where(groups[:-1] != groups[1:])[0] + 1))
return np.multiply.reduceat(values, group_changes)
If you groups are already sorted (if they are not you can do that with
np.argsort), you can do this using thereduceatfunctionality toufuncs (if they are not sorted, you would have to sort them first to do it efficiently):Or mgilson answer if you have few groups. But if you have many groups, then this is much more efficient. Since you avoid boolean indices for every element in the original array for every group. Plus you avoid slicing in a python loop with reduceat.
Of course pandas does these operations conveniently.
Edit: Sorry had
prodin there. The ufunc ismultiply. You can use this method for any binaryufunc. This means it works for basically all numpy functions that can work element wise on two input arrays. (ie. multiply normally multiplies two arrays elementwise, add adds them, maximum/minimum, etc. etc.)