I’m working with data like this:
Sample Detector Cq
P_1 106 23.53152
P_1 106 23.152458
P_1 106 23.685083
P_1 135 24.465698
P_1 135 23.86892
P_1 135 23.723469
P_1 17 22.524242
P_1 17 20.658733
P_1 17 21.146122
As suggested in this post, I’m handling that with a MultiIndex. However, I’m wondering how, with such a structure, do some additional checks. Let’s explain better: each “Sample” column has a fixed number of repeated “Detector” elements, from 1 (no duplication) to several duplicated elements. I want to ensure that for each sample element, the number of detectors is always the same (i.e., if P_1 has 3 “106” detectors, P_2 should have 3 “106” detectors as well).
Currently I’m doing this rather crudely:
def replicate_counter(dataframe, name):
subset = dataframe.ix[name]
num_replicates = subset.index.size / subset.index.unique().size
return num_replicates
# Further down...
# dataframe is a MultiIndex DataFrame like above
counts = pandas.Series([replicate_counter(dataframe, item[0]) for item
in dataframe.index]).unique()
if counts.size != 1:
raise ValueError("Detectors not equal for all samples")
It seems very hacky to me and probably there are better ways to do this in pandas. How could this be accomplished?
Turns out
groupbyis what is needed to make this clear and concise (and probably more efficient too):