the following code snippet is from one of my functions which is passed a list of numbers and is supposed to remove outliers (i.e. very large or very small numbers) from the list.
The code does not seem to work as intended, as the output confirms:
EXTREMA_CUTOFF_THRESHOLD=3.0
if list_values:
avg_val = sum(list_values)/float(len(list_values))
print 'DEBUG: BEFORE:', min(list_values), max(list_values), avg_val
list_values = [x for x in list_values if math.fabs(x - avg_val)/float(avg_val) < EXTREMA_CUTOFF_THRESHOLD]
list_values_len = len(list_values)
if (list_values_len > 0) and (min_sample_size > 0) and (list_values_len < min_sample_size):
print 'DEBUG: Insufficient data for stats calculation for row'
elif (list_values_len > 0):
print 'DEBUG: AFTER:', min(list_values), max(list_values), avg_val
Output:
DEBUG: BEFORE: 11.0 302.0 113.897260274
DEBUG: AFTER: 11.0 302.0 113.897260274
DEBUG: BEFORE: 12.5 273.0 108.382352941
DEBUG: AFTER: 12.5 273.0 108.382352941
DEBUG: BEFORE: 2.5 245.5 69.9166666667
DEBUG: AFTER: 2.5 245.5 69.9166666667
DEBUG: BEFORE: 136.5 499.5 363.775
DEBUG: AFTER: 136.5 499.5 363.775
DEBUG: BEFORE: 39.5 422.5 166.035759097
DEBUG: AFTER: 39.5 422.5 166.035759097
DEBUG: BEFORE: 39.5 422.0 152.305007587
DEBUG: AFTER: 39.5 422.0 152.305007587
DEBUG: BEFORE: 20.5 331.0 84.41015625
DEBUG: AFTER: 20.5 331.0 84.41015625
DEBUG: BEFORE: 7.0 267.5 155.810126582
DEBUG: AFTER: 7.0 267.5 155.810126582
Why are the extreme values not being filtered out?
Your code is working. It’s just that none of the extreme values are more than 3 times the average away from the average, which is what your algorithm implies