I’m trying to index data by their probability (estimated with a simple histogram). The objective is to select items in the series with a probability less then some threshold.
I have a series of integer values, for example:
import pandas as pnd
import numpy as np
series = pnd.Series(np.random.poisson(5, size = 100))
then I calculate their histogram like this:
tmp = {"series" : series, "count" : np.ones(len(series))}
hist = pnd.DataFrame(tmp).groupby("series").sum()
freq = hist / hist.sum()
So now I have the frequencies of each result indexed by the result, and the series of results. I have now two questions:
- Is there a way to index
seriesby the mapping of result/frequency defined byfreq? - If I manage to do this, how do I select only results with frequency greater than some value?
Thanks.
Yes, use the
mapSeries method:you can then do: