Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8744545
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T11:51:26+00:00 2026-06-13T11:51:26+00:00

The following is output from the perf Linux profiler on C++ code produced by

  • 0

The following is output from the perf Linux profiler on C++ code produced by gcc. I am calculating (a[i]+b[i])^c[i] in a loop going from i=n downwards until the loop exists at i=-1. This is by far the hottest loop in my program which can run for hours or days.

If I am understanding this output correctly, perf is telling me that 57% of the time in this function is spent on subtracting 8 from the rdx register. That seems unlikely, seeing as subtracting 1 from the rcx register three lines above is only taking 0.99% of the time. I think I must be missing something. What is the explanation for these numbers? Is the time for the previous instructions somehow unfairly getting charged to the subtraction?

    3.64 :          484388:       mov    0x0(%rbp,%rdx,1),%rax
    0.64 :          48438d:       add    (%rbx,%rdx,1),%rax
    0.99 :          484391:       sub    $0x1,%rcx
    3.60 :          484395:       xor    (%rdi,%rdx,1),%rax
   57.13 :          484399:       sub    $0x8,%rdx
    0.22 :          48439d:       or     %rax,%rsi
    4.23 :          4843a0:       cmp    $0xffffffffffffffff,%rcx
    0.00 :          4843a4:       jne    484388

I got these numbers by doing “perf record ./myprogram”, then “perf report” in the same directory and then I browsed to this piece of assembly.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T11:51:27+00:00Added an answer on June 13, 2026 at 11:51 am

    I found this on the perf wiki:

    Interrupt-based sampling introduces skids on modern processors. That
    means that the instruction pointer stored in each sample designates
    the place where the program was interrupted to process the PMU
    interrupt, not the place where the counter actually overflows, i.e.,
    where it was at the end of the sampling period. In some case, the
    distance between those two points may be several dozen instructions or
    more if there were taken branches. When the program cannot make
    forward progress, those two locations are indeed identical. For this
    reason, care must be taken when interpreting profiles.

    That may be the explanation. Unfortunately, the wiki does not say how to figure out if this is indeed the problem or how to correct for this issue.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

In the following code I need to print output from the variables $stout ,
I've got the following code to output a list of items from amazon, but
How do I access the output from the following mongoDB map reduce code? I
I am currently using the following PHP code to return and format output from
I don't understand the output from the following code: public static void main(String[] args)
I am getting the following output from the following block of code: //create file
I am getting following output from running a php code, now i want to
Consider the following output from a Tomcat server under Eclipse: INFO: Initializing Coyote HTTP/1.1
What does the expression #(?-mix:facebook|twitter) mean in the following output from rake routes ?
The output from the following: import feedparser d = feedparser.parse('http://www.netflix.com/NewWatchInstantlyRSS') d.entries[177].keys() is: ['summary_detail', 'links',

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.