I’m optimizing the function, I try every way and even sse, and modified code to return from different position to see the calculate timespan but finally I found most of the time spends on the bool judgement. Even I replace all code in the if statement with a simple add operation in it, it still cost 6000ms.
My platform is gcc 4.7.1 e5506 cpu. Its input ‘a’ and ‘b’ is a 1000size int array, and ‘asize’, ‘bsize’ are corresponding array size. MATCH_MASK = 16383, I run the function 100000 times to statistics a timespan. Is there any good idea to the problem. Thank you!
if (aoffsets[i] && boffsets[i]) // this line costs most time
Code:
uint16_t aoffsets[DOUBLE_MATCH_MASK] = {0}; // important! or it will only be right on the first time
uint16_t* boffsets = aoffsets + MATCH_MASK;
uint8_t* seen = (uint8_t *)aoffsets;
auto fn_init_offsets = [](const int32_t* x, int n_size, uint16_t offsets[])->void
{
for (int i = 0; i < n_size; ++i)
offsets[MATCH_STRIP(x[i])] = i;
};
fn_init_offsets(a, asize, aoffsets);
fn_init_offsets(b, bsize, boffsets);
uint8_t topcount = 0;
int topoffset = 0;
{
std::vector<uint8_t> count_vec(asize + bsize + 1, 0); // it's the fastest way already, very near to tls
uint8_t* counts = &(count_vec[0]);
//return aoffsets[0]; // cost 1375 ms
for (int i = 0; i < MATCH_MASK; ++i)
{
if (aoffsets[i] && boffsets[i]) // this line costs most time
{
//++affsets[i]; // for test
int offset = (aoffsets[i] -= boffsets[i]);
if ((-n_maxoffset <= offset && offset <= n_maxoffset))
{
offset += bsize;
uint8_t n_cur_count = ++counts[offset];
if (n_cur_count > topcount)
{
topcount = n_cur_count;
topoffset = offset;
}
}
}
}
}
return aoffsets[0]; // cost 6000ms
You can increase the speed of your program by reducing the cache misses:
aoffsets[i]andboffsets[i]are relatively far away from each other in memory. By placing them next to each other, you speed up the program significantly. On my machine (e5400 cpu, VS2012) the execution time is reduced from 3.0 seconds to 2.3 seconds:compared to your version of
test().