Description: I am analysing my application to improve the overall performance, and among the

Question

0

Asked: June 3, 20262026-06-03T09:53:25+00:00 2026-06-03T09:53:25+00:00

Description: I am analysing my application to improve the overall performance, and among the

0

Description:

I am analysing my application to improve the overall performance, and among the different bottle neck I have identified, one of the major area which is draining the performance seems to come from _wordcopy_fwd_dest_aligned instruction/function.

Below is a short description of the problem –

Function gets a buffer stream – containing the length and the char stream in length-value format.
Read the len (some code to check for machine alignment etc).
Read the character array and assign it to the string.

PusedoCode
The pusedo code looks as below

read_buf(max_len)  
v.assign((char*)pdata,max_len)

Now, when the CPU cycles are checked from quantify report, I see heavy utilization in the _wordcopy_fwd_dest_aligned. Also from reports such as – Improve CPU Cycles for performance, I see that idea would be to reduce this by using any alternative approach.

Question

Is there a simple alternative for the above code, which reduces/eliminates the _wordcopy_fwd_dest_aligned usage thus gaining me the performance (even at the cost of memory).
In case the above does not work, any suggested area of work around for the above code? But the final o/p needs to be in string itself.

PS:
a. Since the code needs to work in distributed environment, the word alignment etc needs to be handled, so bit hesitant about option (2) in the Question list.
b. We are using stlport library, so does this needs any tweaking/Can this cause the problem? A simple code with std::string v.assign(...) did not show up _wordcopy_fwd_dest_aligned.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T09:53:26+00:00

Editorial Team

2026-06-03T09:53:26+00:00Added an answer on June 3, 2026 at 9:53 am

That IS the optimized copy routine. To get more performance, you will likely have to eliminate copies, or else sacrifice compatibility with some processor models.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Description: I am analysing my application to improve the overall performance, and among the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply