I’ve done some processing on the image processing stage of a chamfer matcher written in OpenCV, and it seems that 70% of the time is spend on one function:
void ImageProcessor::CarryOutOrientationTransform(int iReadBin, int iUpdateBin)
{
cv::MatIterator_<float> lUpdateImageIterator;
cv::MatConstIterator_<float> lReadImageIterator;
for(lUpdateImageIterator =
mOrientationBins[iUpdateBin].begin<float>(),
lReadImageIterator =
mOrientationBins[iReadBin].begin<float>();
lReadImageIterator != mOrientationBins[iReadBin].end<float>();
lUpdateImageIterator++, lReadImageIterator++)
{
if( *lReadImageIterator + mOrientationCost < *lUpdateImageIterator)
{
*lUpdateImageIterator = *lReadImageIterator+mOrientationCost;
}
}
}
The function is called as follows:
//Transform over the image clockwise 1.5 times
for(int lI = 0;
lI <= mNumberOfOrientationBins + (mNumberOfOrientationBins-1)/2;
lI++)
{
CarryOutOrientationTransform
( lI % mNumberOfOrientationBins,
(lI+1) % mNumberOfOrientationBins );
}
and the reverse anti-clockwise.
ImageProcessing::mOrientationBins is a
std::vector<cv::Mat> mOrientationBins;
The rest of the time is spent carrying out line segmentation and binning, distance transforming over all 20 bins and then integrating over all the images. (I’ve disabled matching). The time spent on the orientation transform seems unreasonably large compared to the rest. Cachegrind also reports that the number of L1 and LL misses is much higher than the rest of the code. I can’t understand this given that iterator passes through in linear fashion and the L1 associativity is 2.
Is the time spent on the code reasonable or have I missed trick?
I think you will gain even more if writing the loop with plain, old pointer style:
Using interators in such a context is not recommended – they are clean and safe, but a bit lazy.