Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9180663
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T18:06:38+00:00 2026-06-17T18:06:38+00:00

Consider the following two programs that perform the same computations in two different ways:

  • 0

Consider the following two programs that perform the same computations in two different ways:

// v1.c
#include <stdio.h>
#include <math.h>
int main(void) {
   int i, j;
   int nbr_values = 8192;
   int n_iter = 100000;
   float x;
   for (j = 0; j < nbr_values; j++) {
      x = 1;
      for (i = 0; i < n_iter; i++)
         x = sin(x);
   }
   printf("%f\n", x);
   return 0;
}

and

// v2.c
#include <stdio.h>
#include <math.h>
int main(void) {
   int i, j;
   int nbr_values = 8192;
   int n_iter = 100000;
   float x[nbr_values];
   for (i = 0; i < nbr_values; ++i) {
      x[i] = 1;
   }
   for (i = 0; i < n_iter; i++) {
      for (j = 0; j < nbr_values; ++j) {
         x[j] = sin(x[j]);
      }
   }
   printf("%f\n", x[0]);
   return 0;
}

When I compile them using gcc 4.7.2 with -O3 -ffast-math and run on a Sandy Bridge box, the second program is twice as fast as the first one.

Why is that?

One suspect is the data dependency between successive iterations of the i loop in v1. However, I don’t quite see what the full explanation might be.

(Question inspired by Why is my python/numpy example faster than pure C implementation?)

EDIT:

Here is the generated assembly for v1:

        movl    $8192, %ebp
        pushq   %rbx
LCFI1:
        subq    $8, %rsp
LCFI2:
        .align 4
L2:
        movl    $100000, %ebx
        movss   LC0(%rip), %xmm0
        jmp     L5
        .align 4
L3:
        call    _sinf
L5:
        subl    $1, %ebx
        jne     L3
        subl    $1, %ebp
        .p2align 4,,2
        jne     L2

and for v2:

        movl    $100000, %r14d
        .align 4
L8:
        xorl    %ebx, %ebx
        .align 4
L9:
        movss   (%r12,%rbx), %xmm0
        call    _sinf
        movss   %xmm0, (%r12,%rbx)
        addq    $4, %rbx
        cmpq    $32768, %rbx
        jne     L9
        subl    $1, %r14d
        jne     L8
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T18:06:39+00:00Added an answer on June 17, 2026 at 6:06 pm

    Ignore the loop structure all together, and only think about the sequence of calls to sin. v1 does the following:

    x <-- sin(x)
    x <-- sin(x)
    x <-- sin(x)
    ...
    

    that is, each computation of sin( ) cannot begin until the result of the previous call is available; it must wait for the entirety of the previous computation. This means that for N calls to sin, the total time required is 819200000 times the latency of a single sin evaluation.

    In v2, by contrast, you do the following:

    x[0] <-- sin(x[0])
    x[1] <-- sin(x[1])
    x[2] <-- sin(x[2])
    ...
    

    notice that each call to sin does not depend on the previous call. Effectively, the calls to sin are all independent, and the processor can begin on each as soon as the necessary register and ALU resources are available (without waiting for the previous computation to be completed). Thus, the time required is a function of the throughput of the sin function, not the latency, and so v2 can finish in significantly less time.


    I should also note that DeadMG is right that v1 and v2 are formally equivalent, and in a perfect world the compiler would optimize both of them into a single chain of 100000 sin evaluations (or simply evaluate the result at compile time). Sadly, we live in an imperfect world.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Consider the two following programs: program one int main() { printf( hello\n ); }
Consider the following two toolbars that are in the same project of mine: Notice
Consider the following two files that are slightly different: foo (old version): <Line 1>
Consider the following program: #include <array> int main() { std::array<int, 1> x = {
Consider the following C++ Program #include<map> #include<iostream> int main() { int a = 5,
Consider the following two Java files that contain a simplified version of my issue
Consider the following two snippets: Exhibit A : template<typename CalcFuncT> int perform_calc(CalcFuncT&& calcfunc) {
Consider the following two FFI structs: class A < FFI::Struct layout :data, :int end
Consider the following two struct : struct a { int a; }; struct b
Consider the following two methods, written in pseudo code, that fetches a complex data

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.