I am trying to use df.apply() function in pandas but getting the following error. The function is trying to convert every entry into 0 if it is less than ‘threshold’
from pandas import *
import numpy as np
def discardValueLessThan(x, threshold):
if x < threshold : return 0
else: return x
df = DataFrame(np.random.randn(8, 3), columns=['A', 'B', 'C'])
>>> df
A B C
0 -1.389871 1.362458 1.531723
1 -1.200067 -1.114360 -0.020958
2 -0.064653 0.426051 1.856164
3 1.103067 0.194196 0.077709
4 2.675069 -0.848347 0.152521
5 -0.773200 -0.712175 -0.022908
6 -0.796237 0.016256 0.390068
7 -0.413894 0.190118 -0.521194
df.apply(discardValueLessThan, 0.1)
>>> df.apply(discardValueLessThan, 0.1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.8.1-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py", line 3576, in apply
return self._apply_standard(f, axis)
File "/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.8.1-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py", line 3637, in _apply_standard
e.args = e.args + ('occurred at index %s' % str(k),)
UnboundLocalError: local variable 'k' referenced before assignment
The error message looks like a
pandasbug to me, but I think there are two other problems.First, I think you have to either specify named parameters or use
argsto pass additional arguments toapply. Your second argument is probably being interpreted as an axis. But if you useor
then you’ll get
because
applydoesn’t act elementwise, it acts on entire Series objects. Other approaches include usingapplymapor boolean indexing, i.e.or simply