I am researching/backtesting a trading system.
I have a Pandas dataframe containing OHLC data and have added several calculated columns which identify price patterns that I will use as signals to initiate positions.
I would now like to add a further column that will keep track of the current net position. I have tried using df.apply(), but passing the dataframe itself as the argument instead of the row object, as with the latter I seem to be unable to look back at previous rows to determine whether they resulted in any price patterns:
open_campaigns = []
Campaign = namedtuple('Campaign', 'open position stop')
def calc_position(df):
# sum of current positions + any new positions
if entered_long(df):
open_campaigns.add(
Campaign(
calc_long_open(df.High.shift(1)),
calc_position_size(df),
calc_long_isl(df)
)
)
return sum(campaign.position for campaign in open_campaigns)
def entered_long(df):
return buy_pattern(df) & (df.High > df.High.shift(1))
df["Position"] = df.apply(lambda row: calc_position(df), axis=1)
However, this returns the following error:
ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1997-07-16 08:00:00')
Rolling window functions would seem to be the natural fit, but as I understand it, they only act on a single time series or column, so wouldn’t work either as I need to access the values of multiple columns at multiple timepoints.
How should I in fact be doing this?
This problem has its roots in NumPy.
entered_longis returning an array-like object. NumPy refuses to guess if an array is True or False:To fix this, use
anyorallto specify what you mean for an array to be True:The
any()method will return True if any of the items inentered_long(df)are True.The
all()method will return True if all the items inentered_long(df)are True.