Description:
I am analysing my application to improve the overall performance, and among the different bottle neck I have identified, one of the major area which is draining the performance seems to come from _wordcopy_fwd_dest_aligned instruction/function.
Below is a short description of the problem –
- Function gets a buffer stream – containing the length and the char stream in length-value format.
- Read the len (some code to check for machine alignment etc).
- Read the character array and assign it to the string.
PusedoCode
The pusedo code looks as below
read_buf(max_len)
v.assign((char*)pdata,max_len)
Now, when the CPU cycles are checked from quantify report, I see heavy utilization in the _wordcopy_fwd_dest_aligned. Also from reports such as – Improve CPU Cycles for performance, I see that idea would be to reduce this by using any alternative approach.
Question
- Is there a simple alternative for the above code, which reduces/eliminates the
_wordcopy_fwd_dest_alignedusage thus gaining me the performance (even at the cost of memory). - In case the above does not work, any suggested area of work around for the above code? But the final o/p needs to be in string itself.
PS:
a. Since the code needs to work in distributed environment, the word alignment etc needs to be handled, so bit hesitant about option (2) in the Question list.
b. We are using stlport library, so does this needs any tweaking/Can this cause the problem? A simple code with std::string v.assign(...) did not show up _wordcopy_fwd_dest_aligned.
That IS the optimized copy routine. To get more performance, you will likely have to eliminate copies, or else sacrifice compatibility with some processor models.