Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6199965
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T04:17:00+00:00 2026-05-24T04:17:00+00:00

What is the difference in CPU cycles (or, in essence, in ‘speed’) between x

  • 0

What is the difference in CPU cycles (or, in essence, in ‘speed’) between

 x /= y;

and

 #include <cmath>
 x = sqrt(y);

EDIT: I know the operations aren’t equivalent, I’m just arbitrarily proposing x /= y as a benchmark for x = sqrt(y)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T04:17:00+00:00Added an answer on May 24, 2026 at 4:17 am

    The answer to your question depends on your target platform. Assuming you are using most common x86 cpus, I can give you this link http://instlatx64.atw.hu/ This is a collection of measured instruction latency (How long will it take to CPU to get result after it has argument) and how they are pipelined for many x86 and x86_64 processors. If your target is not x86, you can try to measure cost yourself or consult with your CPU documentation.

    Firstly you should get a disassembler of your operations (from compiler e.g. gcc: gcc file.c -O3 -S -o file.asm or via dissasembly of compiled binary, e.g. with help of debugger).
    Remember, that In your operation there is loading and storing a value, which must be counted additionally.

    Here are two examples from friweb.hu:

    For Core 2 Duo E6700 latency (L) of SQRT (both x87, SSE and SSE2 versions)

    • 29 ticks for 32-bit float; 58 ticks for 64-bit double; 69 ticks for 80-bit long double;

    of DIVIDE (of floating point numbers):

    • 18 ticks for 32-bit; 32 ticks for 64-bit; 38 ticks for 80-bit

    For newer processors, the cost is less and is almost the same for DIV and for SQRT, e.g. for Sandy Bridge Intel CPU:

    Floating-point SQRT is

    • 14 ticks for 32 bit; 21 ticks for 64 bit; 24 ticks for 80 bit

    Floating-point DIVIDE is

    • 14 ticks for 32 bit; 22 ticks for 64 bit; 24 ticks for 80 bit

    SQRT even a tick faster for 32bit.

    So: For older CPUs, sqrt is itself 30-50 % slower than fdiv; For newer CPU the cost is the same.
    For newer CPU, cost of both operations become lower that it was for older CPUs;
    For longer floating format you needs more time; e.g. for 64-bit you need 2x time than for 32bit; but 80-bit is cheapy compared with 64-bit.

    Also, newer CPUs have vector operations (SSE, SSE2, AVX) of the same speed as scalar (x87). Vectors are of 2-4 same-typed data. If you can align your loop to work on several FP values with same operation, you will get more performance from CPU.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

What's the difference (CPU usage, MSIL, etc) between: StreamWriter sw = new StreamWriter(C:\test.txt); and:
I understand the difference between an addEventListener and the onclick property and know how
What is the difference between the 'self' and 'total' columns in the Chrome CPU
The difference between Chr and Char when used in converting types is that one
What is main difference between INSERT INTO table VALUES .. and INSERT INTO table
What is difference between in the following statements String name = Tiger; final String
What is the difference between a generative and a discriminative algorithm?
is there a difference between a struct in c++ and a struct in c#?
What is the difference between a template class and a class template?
Is there a major performance difference between the following: <iframe style=visibility:hidden /> <iframe style=width:0px;

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.