I need to efficiently swap the byte order of an array during copying into another array.
The source array is of a certain type; char, short or int so the byte swapping required is unambiguous and will be according to that type.
My plan is to do this very simply with a multi-pass byte-wise copy (2 for short, 4 for int, …). However are there any pre-existing “memcpy_swap_16/32/64” functions or libraries? Perhaps in image processing for BGR/RGB image processing.
EDIT
I know how to swap the bytes of individual values, that is not the problem. I want to do this process during a copy that I am going to perform anyway.
For example, if I have an array or little endian 4-byte integers I can do they swap by performing 4 bytewise copies with initial offsets of 0, 1, 2 and 3 with a stride of 4. But there may be a better way, perhaps even reading each 4-byte integer individually and using the byte-swap intrinsics _byteswap_ushort, _byteswap_ulong and _byteswap_uint64 would be faster. But I suspect there must be existing functions that do this type of processing.
EDIT 2
Just found this, which may be a useful basis for SSE, though its true that memory bandwidth probably makes it a waste of time.
Yes there are existing functions like the one linked in the question but its not worth the effort because the size of the data (in this case) means the set up overhead is too high. So instead, it’s better to just read out 2, 4, and 8 bytes at a time and do the swap using intrinsics and write back.