I have a dataframe of values and I would like to explore the rows that are outliers. I wrote a function below that can be called with the groupby().apply() function and it works great for high or low values but when I want to combine them together i generate an error. I am somehow messing up the boolean OR selection but I could only find documentation for selection criteria using &. Any suggestions would be appreciated.
zach cp
df = DataFrame( {'a': [1,1,1,2,2,2,2,2,2,2], 'b': [5,5,6,9,9,9,9,9,9,20] } )
#this works fine
def get_outliers(group):
x = mean(group.b)
y = std(group.b)
top_cutoff = x + 2*y
bottom_cutoff = x - 2*y
cutoffs = group[group.b > top_cutoff]
return cutoffs
#this will trigger an error
def get_all_ outliers(group):
x = mean(group.b)
y = std(group.b)
top_cutoff = x + 2*y
bottom_cutoff = x -2*y
cutoffs = group[(group.b > top_cutoff) or (group.b < top_cutoff)]
return cutoffs
#works fine
grouped1 = df.groupby(['a']).apply(get_outliers)
#triggers error
grouped2 = df.groupby(['a']).apply(get_all_outliers)
You need to use
|instead ofor. Theandandoroperators are special in Python and don’t interact well with things like numpy and pandas that try to apply to them elementwise across a collection. So for these contexts, they’ve redefined the “bitwise” operators&and|to mean “and” and “or”.