Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6577643
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T15:39:47+00:00 2026-05-25T15:39:47+00:00

When viewing the assembly output of the following code (no optimizations, -O2 and -O3

  • 0

When viewing the assembly output of the following code (no optimizations, -O2 and -O3 produce very similar results):

int main(int argc, char **argv)
{
    volatile float f1 = 1.0f;
    volatile float f2 = 2.0f;

    if(f1 > f2)
    {
        puts("+");
    }
    else if(f1 < f2)
    {
        puts("-");
    }

    return 0;
}

GCC does something that I have a hard time following:

.LC2:
    .string "+"
.LC3:
    .string "-"
    .text
.globl main
    .type   main, @function
main:
.LFB2:
    pushq   %rbp
.LCFI0:
    movq    %rsp, %rbp
.LCFI1:
    subq    $32, %rsp
.LCFI2:
    movl    %edi, -20(%rbp)
    movq    %rsi, -32(%rbp)
    movl    $0x3f800000, %eax
    movl    %eax, -4(%rbp)
    movl    $0x40000000, %eax
    movl    %eax, -8(%rbp)
    movss   -4(%rbp), %xmm1
    movss   -8(%rbp), %xmm0
    ucomiss %xmm0, %xmm1
    jbe .L9
.L7:
    movl    $.LC2, %edi
    call    puts
    jmp .L4
.L9:
    movss   -4(%rbp), %xmm1
    movss   -8(%rbp), %xmm0
    ucomiss %xmm1, %xmm0
    jbe .L4
.L8:
    movl    $.LC3, %edi
    call    puts
.L4:
    movl    $0, %eax
    leave
    ret

Why does GCC move the the float values into xmm0 and xmm1 twice and also run ucomiss twice?

Wouldn’t it be faster to do the following?

.LC2:
    .string "+"
.LC3:
    .string "-"
    .text
.globl main
    .type   main, @function
main:
.LFB2:
    pushq   %rbp
.LCFI0:
    movq    %rsp, %rbp
.LCFI1:
    subq    $32, %rsp
.LCFI2:
    movl    %edi, -20(%rbp)
    movq    %rsi, -32(%rbp)
    movl    $0x3f800000, %eax
    movl    %eax, -4(%rbp)
    movl    $0x40000000, %eax
    movl    %eax, -8(%rbp)
    movss   -4(%rbp), %xmm1
    movss   -8(%rbp), %xmm0
    ucomiss %xmm0, %xmm1
    jb  .L8 # jump if less than
    je  .L4 # jump if equal
.L7:
    movl    $.LC2, %edi
    call    puts
    jmp .L4
.L8:
    movl    $.LC3, %edi
    call    puts
.L4:
    movl    $0, %eax
    leave
    ret

I’m not at all a real assembly programmer, but it just seemed odd to me to have duplicate instructions running. Is there a problem with my version of the code?


Update

If you remove the volatile which I had originally and replace it with scanf(), you get the same results:

int main(int argc, char **argv)
{
    float f1;
    float f2;

    scanf("%f", &f1);
    scanf("%f", &f2);

    if(f1 > f2)
    {
        puts("+");
    }
    else if(f1 < f2)
    {
        puts("-");
    }

    return 0;
}

And the corresponding assembler:

.LCFI2:
    movl    %edi, -20(%rbp)
    movq    %rsi, -32(%rbp)
    leaq    -4(%rbp), %rsi
    movl    $.LC0, %edi
    movl    $0, %eax
    call    scanf
    leaq    -8(%rbp), %rsi
    movl    $.LC0, %edi
    movl    $0, %eax
    call    scanf
    movss   -4(%rbp), %xmm1
    movss   -8(%rbp), %xmm0
    ucomiss %xmm0, %xmm1
    jbe .L9
.L7:
    movl    $.LC1, %edi
    call    puts
    jmp .L4
.L9:
    movss   -4(%rbp), %xmm1
    movss   -8(%rbp), %xmm0
    ucomiss %xmm1, %xmm0
    jbe .L4
.L8:
    movl    $.LC2, %edi
    call    puts
.L4:
    movl    $0, %eax
    leave
    ret

Final Update

After reviewing some of the follow up comments, it seems han (who commented under Jonathan Leffler’s post) nailed this problem. GCC does not make the optimization not because it can’t but because I hadn’t told it to. It seems it all comes down to IEEE floating point rules and to handle the strict conditions GCC can’t simply do a jump if above or jump if below after the first UCOMISS, because it needs to handle all the special conditions of floating point numbers. When using han’s recommendation of the -ffast-math optimizer (none of the -Ox flags enable -ffast-math as it can break some programs) GCC does exactly what I was looking for:

The following assembly was produced using GCC 4.3.2 “gcc -S -O3 -ffast-math test.c”

.LC0:
    .string "%f"
.LC1:
    .string "+"
.LC2:
    .string "-"
    .text
    .p2align 4,,15
.globl main
    .type   main, @function
main:
.LFB25:
    subq    $24, %rsp
.LCFI0:
    movl    $.LC0, %edi
    xorl    %eax, %eax
    leaq    20(%rsp), %rsi
    call    scanf
    leaq    16(%rsp), %rsi
    xorl    %eax, %eax
    movl    $.LC0, %edi
    call    scanf
    movss   20(%rsp), %xmm0
    comiss  16(%rsp), %xmm0
    ja  .L11
    jb  .L12
    xorl    %eax, %eax
    addq    $24, %rsp
    .p2align 4,,1
    .p2align 3
    ret
    .p2align 4,,10
    .p2align 3
.L12:
    movl    $.LC2, %edi
    call    puts
    xorl    %eax, %eax
    addq    $24, %rsp
    ret
    .p2align 4,,10
    .p2align 3
.L11:
    movl    $.LC1, %edi
    call    puts
    xorl    %eax, %eax
    addq    $24, %rsp
    ret

Notice the two UCOMISS instructions are now replaced with one COMISS directly followed by a JA (jump if above) and JB (jump if below). GCC is able to nail this optimization if you let it using -ffast-math!

UCOMISS vs COMISS (http://www.softeng.rl.ac.uk/st/archive/SoftEng/SESP/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc315.htm): “The UCOMISS instruction differs from the COMISS instruction in that it signals an invalid SIMD floating-point exception only when a source operand is an SNaN. The COMISS instruction signals invalid if a source operand is either a QNaN or an SNaN.”

Thanks again everyone for the helpful discussion.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T15:39:48+00:00Added an answer on May 25, 2026 at 3:39 pm

    Here’s another reason:
    If you take a close look at it, it’s NOT the same expression.

    They are not complements of each other. Therefore, you have to do two comparisons anyway. volatile will force the values to be reloaded.

    EDIT: (see comments, I forgot you can do that with the flags)

    To answer the new question:

    Combining the those two ucomiss is not a completely obvious optimization from the compiler’s perspective.

    In order to combine them, the compiler must:

    1. Recognize that ucomiss %xmm0, %xmm1 is the “same” as ucomiss %xmm1, %xmm0.
    2. Then it must do a common sub-expression elimination pass to pull it out.

    All of this needs to be done after the compiler does instruction selection. And most of the optimization passes are done before instruction selection.

    What worries me more is why f1 and f2 aren’t being kept in registers after you got rid of the volatiles. -O3 is really giving you this?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm interested in viewing the actual x86 assembly output by a C# program (not
When viewing piped output to Less, sometimes I'd like to be able to view
I use emacs for viewing and editing code and other text files. I wanted
I'm viewing variables using the debugger. In debug builds everything in the code below
I have an error very similar to the one addressed in this question .
I've noticed that when viewing query results in a grid in SQL Server Management
From viewing the source code that this code makes it looks like itemValue generate
I'm viewing the stack at the beginning of main, but the ebp of main
When viewing a project in Jenkins, I'd like to see the last console output
When viewing a plain text code file (i.e. .py, .c, .cpp, .m, .as, .js,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.