Today I ran into the problem that accessing the vector elements slowed down with the size of the vector. As it is not my code, I cannot post it, so please bear with me. I will try to describe it as detailed as possible.
The functionality of the code is as follows:
1. a Dataset class, takes a .txt file, which contains file names. These point to standard png images which need to be loaded. This is done by an Image<T> class. The images are loaded as Image<unsigned char> and pushed back into a std::Vector.
2. After the loading of the data has been done. I can access the vector in my dataset in order to work with it. So it looks something like this:
Dataset d;
d.init("filenames_list.txt"); //Loads the images
for(int i=0; i< d.getDatavector().size(); i++){
Image<unsigned char> current = d.getDatavector()[i];
//Do work on current image here.
}
Here getDatavector() will return a std::Vector<Image<unsigned char> >. The images hold three ints, for width, height and the number of channels and furthermore a Boost shared pointer that points to the interleaved data.
For small testruns, I have a list of files which contains about 150 images. Running the program with this works fine and speed measurements tell me that
Image<unsigned char> current = d.getDatavector()[i];
takes about 10ms to be completed. If however I want to work on my full dataset of 1500 images, the above line takes about 500ms to complete. I’ve tried to do many different things to fix it, but I am somewhat limited by the general structure of the code and by the memory. Because if I do the following:
const std::Vector<Image<unsigned char> > data = d.getDatavector();
before the loop, it runs very fast, but I soon run out of memory.
I know my problem description is somewhat vague, and I am not hoping for the exact solution, but I am hoping for some tips on where to look. I searched for similar problems, but people only seem concerned with the general speed of vectors versus arrays. My problem is though, that the speed degrades with the length of the vector! If someone has seen this kind of problem, any suggestions are very much welcome!
So far I have tried accessing the content using the std::vector::iterator or using (d.getDatavector().data()) as a pointer. Nothing seems to improve the speed of it.
What does the signature of
getDataVector()look like? Is itIf so, the function is returning the
vectorby value, and every time you writed.getDatavector()[i]a copy of thevectoris made, theielement is copied out of thevectorand then thevectoritself is destroyed.If you can modify the
Datasetclass change the function toNow copies won’t be made every time the function is called.
If you’re unable to modify the class, make a single copy before entering the loop, and then use the local variable within the loop.
It is impossible for the problem to be the indexing since
std::vector‘s underlying data array is required to be contiguous and so accessing the ith element is as simple as addingito the pointer marking the start address of the data array and dereferencing the result.