I have an IplImage from openCV, which stores its data in a row-ordered format;
image data is stored in a one dimensional array char *data; the element at position x,y is given by
elem(x,y) = data[y*width + x] // see note at end
I would like to convert this image as quickly as possible to and from a second image format that stores its data in column-ordered format; that is
elem(x,y) = data[x*height + y]
Obviously, one way to do this conversion is simply element-by-element through a double for loop.
Is there a faster way?
note for openCV afficionados, the actual location of elem(x,y) is given by data + y*widthstep + x*sizeof(element) but this gives the general idea, and for char data sizeof(element) = 1 and we can make widthstep = width, so the formula is exact
It is called “matrix transposition”
Optimal methods try to minimise the number of cache misses, swapping small tiles
with the size of one or a few cache slots. For a multi-level cache this will get difficult.
start reading here
this one is a bit more advanced
BTW the urls deal with “in place” transposition. Creating a transposed copy will be different (it uses twice as many cache slots, duh!)