I am trying to run a Winsorized regression in pandas for Python. The very helpful user manual offers this example code:
winz = rets.copy()
std_1year = rolling_std(rets, 250, min_periods=20)
cap_level = 3 * np.sign(winz) * std_1year
winz[np.abs(winz) > 3 * std_1year] = cap_level
winz_model = ols(y=winz['AAPL'], x=winz.ix[:, ['GOOG']],window=250)
The fourth line looks wrong to me: shouldn’t the RHS be cap_level[np.abs(winz) > 3 * std_1year]?
Thanks for the help! I’m still new to using the Pandas dataframe and want to make sure I’m understanding right.
Edit: sorry, misunderstood the question!
You’re correct that this would be wrong for most types; however
pandas.DataFramehas special support for setting values using a Boolean mask; it will select the corresponding values from the RHS with the corresponding time value. Under the hood it’s usingnp.putmask.You can check this for yourself: