Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6764435
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T14:35:38+00:00 2026-05-26T14:35:38+00:00

I have some no understanding about how one can cast int to float, step

  • 0

I have some no understanding about how one can cast int to float, step by step? Assume I have a signed integer number which is in binary format. Moreover, I want cast it to float by hand. However, I can’t. Thus, CAn one show me how to do that conversion step by step?

I do that conversion in c, many times ? like;

  int a = foo ( );
  float f = ( float ) a ;

But, I haven’t figure out what happens at background. Moreover, To understand well, I want do that conversion by hand.

EDIT: If you know much about conversion, you can also give information about for float to double conversion. Moreover, for float to int

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T14:35:38+00:00Added an answer on May 26, 2026 at 2:35 pm

    Floating point values (IEEE754 ones, anyway) basically have three components:

    • a sign s;
    • a series of exponent bits e; and
    • a series of mantissa bits m.

    The precision dictates how many bits are available for the exponent and mantissa. Let’s examine the value 0.1 for single-precision floating point:

    s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm    1/n
    0 01111011 10011001100110011001101
               ||||||||||||||||||||||+- 8388608
               |||||||||||||||||||||+-- 4194304
               ||||||||||||||||||||+--- 2097152
               |||||||||||||||||||+---- 1048576
               ||||||||||||||||||+-----  524288
               |||||||||||||||||+------  262144
               ||||||||||||||||+-------  131072
               |||||||||||||||+--------   65536
               ||||||||||||||+---------   32768
               |||||||||||||+----------   16384
               ||||||||||||+-----------    8192
               |||||||||||+------------    4096
               ||||||||||+-------------    2048
               |||||||||+--------------    1024
               ||||||||+---------------     512
               |||||||+----------------     256
               ||||||+-----------------     128
               |||||+------------------      64
               ||||+-------------------      32
               |||+--------------------      16
               ||+---------------------       8
               |+----------------------       4
               +-----------------------       2
    

    The sign is positive, that’s pretty easy.

    The exponent is 64+32+16+8+2+1 = 123 - 127 bias = -4, so the multiplier is 2-4 or 1/16. The bias is there so that you can get really small numbers (like 10-30) as well as large ones.

    The mantissa is chunky. It consists of 1 (the implicit base) plus (for all those bits with each being worth 1/(2n) as n starts at 1 and increases to the right), {1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}.

    When you add all these up, you get 1.60000002384185791015625.

    When you multiply that by the 2-4 multiplier, you get 0.100000001490116119384765625, which is why they say you cannot represent 0.1 exactly as an IEEE754 float.

    In terms of converting integers to floats, if you have as many bits in the mantissa (including the implicit 1), you can just transfer the integer bit pattern over and select the correct exponent. There will be no loss of precision. For example a double precision IEEE754 (64 bits, 52/53 of those being mantissa) has no problem taking on a 32-bit integer.

    If there are more bits in your integer (such as a 32-bit integer and a 32-bit single precision float, which only has 23/24 bits of mantissa) then you need to scale the integer.

    This involves stripping off the least significant bits (rounding actually) so that it will fit into the mantissa bits. That involves loss of precision of course but that’s unavoidable.


    Let’s have a look at a specific value, 123456789. The following program dumps the bits of each data type.

    #include <stdio.h>
    
    static void dumpBits (char *desc, unsigned char *addr, size_t sz) {
        unsigned char mask;
        printf ("%s:\n  ", desc);
        while (sz-- != 0) {
            putchar (' ');
            for (mask = 0x80; mask > 0; mask >>= 1, addr++)
                if (((addr[sz]) & mask) == 0)
                    putchar ('0');
                else
                    putchar ('1');
        }
        putchar ('\n');
    }
    
    int main (void) {
        int intNum = 123456789;
        float fltNum = intNum;
        double dblNum = intNum;
    
        printf ("%d %f %f\n",intNum, fltNum, dblNum);
        dumpBits ("Integer", (unsigned char *)(&intNum), sizeof (int));
        dumpBits ("Float", (unsigned char *)(&fltNum), sizeof (float));
        dumpBits ("Double", (unsigned char *)(&dblNum), sizeof (double));
    
        return 0;
    }
    

    The output on my system is as follows:

    123456789 123456792.000000 123456789.000000
    integer:
       00000111 01011011 11001101 00010101
    float:
       01001100 11101011 01111001 10100011
    double:
       01000001 10011101 01101111 00110100 01010100 00000000 00000000 00000000
    

    And we’ll look at these one at a time. First the integer, simple powers of two:

       00000111 01011011 11001101 00010101
            |||  | || || ||  || |    | | +->          1
            |||  | || || ||  || |    | +--->          4
            |||  | || || ||  || |    +----->         16
            |||  | || || ||  || +---------->        256
            |||  | || || ||  |+------------>       1024
            |||  | || || ||  +------------->       2048
            |||  | || || |+---------------->      16384
            |||  | || || +----------------->      32768
            |||  | || |+------------------->      65536
            |||  | || +-------------------->     131072
            |||  | |+---------------------->     524288
            |||  | +----------------------->    1048576
            |||  +------------------------->    4194304
            ||+---------------------------->   16777216
            |+----------------------------->   33554432
            +------------------------------>   67108864
                                             ==========
                                              123456789
    

    Now let’s look at the single precision float. Notice the bit pattern of the mantissa matching the integer as a near-perfect match:

    mantissa:       11 01011011 11001101 00011    (spaced out).
    integer:  00000111 01011011 11001101 00010101 (untouched).
    

    There’s an implicit 1 bit to the left of the mantissa and it’s also been rounded at the other end, which is where that loss of precision comes from (the value changing from 123456789 to 123456792 as in the output from that program above).

    Working out the values:

    s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm    1/n
    0 10011001 11010110111100110100011
               || | || ||||  || |   |+- 8388608
               || | || ||||  || |   +-- 4194304
               || | || ||||  || +------  262144
               || | || ||||  |+--------   65536
               || | || ||||  +---------   32768
               || | || |||+------------    4096
               || | || ||+-------------    2048
               || | || |+--------------    1024
               || | || +---------------     512
               || | |+-----------------     128
               || | +------------------      64
               || +--------------------      16
               |+----------------------       4
               +-----------------------       2
    

    The sign is positive. The exponent is 128+16+8+1 = 153 - 127 bias = 26, so the multiplier is 226 or 67108864.

    The mantissa is 1 (the implicit base) plus (as explained above), {1/2, 1/4, 1/16, 1/64, 1/128, 1/512, 1/1024, 1/2048, 1/4096, 1/32768, 1/65536, 1/262144, 1/4194304, 1/8388608}. When you add all these up, you get 1.83964955806732177734375.

    When you multiply that by the 226 multiplier, you get 123456792, the same as the program output.

    The double bitmask output is:

    s eeeeeeeeeee mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
    0 10000011001 1101011011110011010001010100000000000000000000000000
    

    I am not going to go through the process of figuring out the value of that beast 🙂 However, I will show the mantissa next to the integer format to show the common bit representation:

    mantissa:       11 01011011 11001101 00010101 000...000 (spaced out).
    integer:  00000111 01011011 11001101 00010101           (untouched).
    

    You can once again see the commonality with the implicit bit on the left and the vastly greater bit availability on the right, which is why there’s no loss of precision in this case.


    In terms of converting between floats and doubles, that’s also reasonably easy to understand.

    You first have to check the special values such as NaN and the infinities. These are indicated by special exponent/mantissa combinations and it’s probably easier to detect these up front ang generate the equivalent in the new format.

    Then in the case where you’re going from double to float, you obviously have less of a range available to you since there are less bits in the exponent. If your double is outside the range of a float, you need to handle that.

    Assuming it will fit, you then need to:

    • rebase the exponent (the bias is different for the two types).
    • copy as many bits from the mantissa as will fit (rounding if necessary).
    • padding out the rest of the target mantissa (if any) with zero bits.
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some not understanding actions from gnu clisp Suppose, I have some code
I am learning JPA from this tutorial . I have some confusions in understanding
I have some basic questions around understanding fundamentals of Performance testing. I know that
i have a little difficulty in understanding how to do some INSERT SELECT. For
I have some ASP.NET web services which all share a common helper class they
I have some C# / asp.net code I inherited which has a textbox which
I am creating a linklist class and have some confusion about reference to objects.
I'm not understanding some key bit of ExtJs idiom so this question is about
This is the opposite problem from most about which I have read. I am
I'm having some trouble understanding how codeigniters loading works. Well first you have the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.