I am writing an application whose purpose is to optimize a trading strategy. For

Question

0

Asked: June 12, 20262026-06-12T20:33:21+00:00 2026-06-12T20:33:21+00:00

I am writing an application whose purpose is to optimize a trading strategy. For

0

I am writing an application whose purpose is to optimize a trading strategy. For the sake of simplicity, assume only that we have a trading strategy that says “enter here”, then another that says “exit here if in a trade” and then lets have two models: one says how much risk we should take (how much we lose if we’re on the wrong side of the market) and the other says how much profit we should take (i.e. how much profit we will take if the market agrees).

For simplicity sake, I will refer to historical realized trades as ticks. That means if I “enter on tick 28” this means I would have entered a trade in the time of 28th trade in my dataset at the price of this trade. Ticks are stored chronologically in my dataset.

Now, imagine the entry strategy on the whole dataset comes up with 500 entries. For each entry, I can precalculate the exact entry tick. I can also calculate the exit points determined by the exit strategy for each entry point (again as tick numbers). For each entry, I can also precalculate the modeled loss and profit and the ticks where these losses or profits would have been hit. The last thing that remains to be done is calculating what would have happenned first, i.e. exit on strategy, exit on a loss or exit on a profit.

Hence, I iterate through the array of trades and calculate exitTick[i] = min(exitTickByStrat[i], exitTickByLoss[i], exitTickByProfit[i]). And the whole process is bloody slow (let’s say I do this 100M times). I suspect cache misses are the main culprit. And the question is: can this be made faster somehow? I have to iterate through 4 arrays of some non-trivial length. One suggestion I have come up with would be to group data in tuples of four, i.e. have one array of structures like (entryTick, exitOnStrat, exitOnLoss, exitOnProfit). This might be faster due to better cache predictability, but I cannot say for sure. Why I haven’t tested it so far is that instrumenting profilers somehow don’t work for release binaries of my app while sampling profilers seem to me to be unreliable (I have tried Intel’s profiler).

So the final questions are: can this problem be made faster? What is the best profiler to use for mem profiling with release binaries? I work on Win7, VS2010.

Edit:
Many thanks to all. I tried to simplify my original question as much as possible, hence the confusion. Just to make sure it’s readable – target means an envisaged/realized profit, stop means an envisaged/realized loss.

The optimizer is a brute-force one. So, i have some strat settings (e.g. indicator periods, whatever), then min/max breakEvenAfter/breakEvenBy and then formulas to give you stop/target values in ticks. These formulas are also objects of optimization. Hence, I have a structure of optimization like

for each in params
{
   calculateEntries()
   for each in beSettings
   {
      precalculateBeData()
      for each in targetFormulaSettings
      {
          precalculateTargetsAndRespectiveExitTicks
          for each in stopFormulaSettings
          {
              precalulcateStopsAndRespectiveExitsTicks
              evaluateExitsAndDetermineImprovement()
          }
       }
    }
}

So I precalculate stuff as much as possible and only calculate something when I need it. And out of 30 seconds, the calculation spends 25 seconds in the evaluateExitsAndDetermineImprovement() function which does just what I described in the original question, i.e. picks min(exitOnPattern, exitOnStop, exitOnTarget). The reason why I need to call the function 100M times is because I have 100M combinations of all params combined. But within the last for cycle only the exitOnStops array changes. I can post some code if that helps. Im grateful for all the comments!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T20:33:22+00:00

So, after some work, I understood the advice by Alexandre C. When I ran cache-miss profiling, I found that out of 15M calls of the evaluateExits() function I have only 30K cache misses hence the performance of this function cannot be hindered by cache. Hence, I had to “start believing” that VTune is actually producing valid results, albeit weird. Since the analysis of VTune output does not match the current thread’s name, I decided to start a new thread. Thank you all for opinions and recommendations.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing an application whose purpose is to optimize a trading strategy. For

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply