Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8285611
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T11:30:54+00:00 2026-06-08T11:30:54+00:00

yes i know, making bitwise ops on double values seems like a bad idea,

  • 0

yes i know, making bitwise ops on double values seems like a bad idea, but i actually need it.

You don’t need to read the next paragraph for my question, only for the curious of you guys:

I actually try a special mod to the Mozilla Tamarin (Actionscript Virtual Machine). In it, any object has the first 3 bits reserved for it’s type (double is 7 for example). These bits reduce precision for primitive data types (int only 29 bits etc.). For my mod, i need to expand this area by 2 bits. This means, when you for example add 2 doubles, you need to set these last 5 bits to zero, do the math, then reset them on the result. so much for the why ^^

Now back to the code.
Here a minimal example which shows a very similar problem:

double *d = new double; 
*d = 15.25; 
printf("float: %f\n", *d);

//forced hex output of double
printf("forced bitwise of double: ");
unsigned char * c = (unsigned char *) d;
int i;
for (i = sizeof (double)-1; i >=0 ; i--) {
     printf ("%02X ", c[i]);
}
printf ("\n");

//cast to long long-pointer, so that bitops become possible
long long * l = (long long*)d;
//now the bitops: 
printf("IntHex: %016X, float: %f\n", *l, *(double*)l); //this output is wrong!
*l = *l | 0x07; 
printf("last 3 bits set to 1: %016X, float: %f\n", *l, *d);//this output is wrong!
*l = *l | 0x18; 
printf("2 bits more set to 1: %016X, float: %f\n", *l, *d);//this output is wrong!

when running this in VisualStudio2008, the first output is correct. second too. 3rd yields 0 for both hex and float-representation, which is obviously wrong. 4th and 5th also zero for both hex and float, but the modified bits show in the hex-value. So i thought, maybe the typecast messed things up here. so 2 more outputs:

printf("float2: %f\n", *(double*)(long long*)d); //almost right
printf("float3: %f\n", *d); //almost right

well, they show 15.25, but it should be 15.2500000000000550670620214078. so i thought, hey, it’s just the precision issue in the output. lets modify a bit further up:

*l = *l |= 0x10000000000;
printf("float4: %f\n", *d);

again, output is 15.25(0000), and not 15.2519531250000550670620214078. Weird enough, another forced hex output (see code above) shows no modification of d at all. so i tinkered a bit, and realized that bit 31 (0x80000000) is the last one i can set by hand. and holy moly, it actually has an effect on the output (15.250004)!

so, though i slightly strayed, still a lot of confusion. is printf broken? am i having a big/little-endian confusion here? am i accidently creating some kind of buffer overrun?

If anybody is interested, in the original problem (the tamarin thing, see above) it’s pretty much inverse. there, the last three bits are already set (which represents a double). setting them to zero works fine (which is the original implementation). setting 2 more to zero has the same effect as above (overall value gets floored to zero). which by the way is not output-specific, but also math-ops seem to work with those floored values (mul of 2 values obtained like that results in 0).

Any help would be appreciated.
Greetings.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T11:30:55+00:00Added an answer on June 8, 2026 at 11:30 am

    well, they show 15.25, but it should be 15.2500000000000550670620214078

    By default, %f displays 6 digits of precision, so you won’t see the difference. You also need to specify that the first argument is long long rather than int, using the ll modifier; otherwise, it might print garbage. If you fix that and use a higher precision, such as %.30f, you should see the expected result:

    printf("last 3 bits set to 1: %016llX, float: %.30f\n", *l, *d);
    printf("2 bits more set to 1: %016llX, float: %.30f\n", *l, *d);
    
    last 3 bits set to 1: 0000000000000007, float: 15.250000000000012434497875801753
    2 bits more set to 1: 000000000000001F, float: 15.250000000000055067062021407764
    

    lets modify a bit further up:

    *l = *l |= 0x10000000000;
    printf("float4: %f\n", *d);
    

    You have a rogue = giving undefined behaviour, so the value may or may not end up being modified (and the program may or may not crash, phone out for pizza, or destroy the universe). Also, if your compiler isn’t C++11 compliant, the type of the integer literal might be no larger than long, which might only be 32 bits; in which case it will (probably) become zero.

    Fixing those (and in my case, with your code as it is), I get the expected result:

    *l = *l | 0x10000000000LL;  // just one assignment, and "LL" to force "long long"
    printf("float4: %f\n", *d);
    
    
    float4: 15.251953
    

    Here is a demonstration.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

(Yes I know I can call Java code from Scala; but that is pointless;
Yes I know, this title isn't really helpfull but this is the exact problem.
I have the following code (Yes I know it's quite long winded, but I
I want to develop a web application, like an online scheduler. (Yes I know
Well I know it might sound a bit strange but yes my question is:
I'm making a HTTP Post and I would like to know how to convert
i know this seems to be a weird question, and it is! But taking
I'm making an UISlider and I don't find how to change the selectable values,
Yes I know that it shouldn't be abused and that C# is primariy used
Ok so I start coding websites (yes I know I am almost 15 years

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.