I’m trying to sum the elements of separate data array by their characteristics efficiently. I have three identifying characteristics (age, year, and cause) in a given array, and for each age, year, cause, I have 1000 values. I need to add those values to another data array when the characteristics are the same. For now, I’m doing something like this where each datasets is ~ (80000, 1000):
import numpy as np
datasets = np.vstack(dataset1, dataset2)
for a in ages:
for y in years:
for c in causes:
output = np.sum(datasets[(age==a) & (year==y) & (cause==c)], axis = 0)
However, with 60,000 iterations, this is incredibly slow. The challenge is that the arrays don’t necessarily all have the same shape. Any thoughts?
SEE LINK BELOW
I’m not sure how to properly link another answer to this answer. When I tried one sentence followed by the link, it converted the answer to a comment. I’m now being long-winded to try and make stack-overflow think that this text is long enough to constitute an answer. Here is the link to a great answer to this question.
Summing Arrays by Characteristics in Python