Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8250733
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T23:59:50+00:00 2026-06-07T23:59:50+00:00

I need to convert a 32 bit IEEE754 float to a signed Q19.12 fixed-point

  • 0

I need to convert a 32 bit IEEE754 float to a signed Q19.12 fixed-point format. The problem is that it must be done in a fully deterministic way, so the usual (int)(f * (1 << FRACTION_SHIFT)) is not suitable, since it uses non-deterministic floating point math. Are there any “bit fiddling” or similar deterministic conversion methods?

Edit: Deterministic in this case is assumed as: given the same floating point data achieve exactly same conversion results on different platforms.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T23:59:52+00:00Added an answer on June 7, 2026 at 11:59 pm

    While @StephenCanon’s answer might be right about this particular case being fully deterministic, I’ve decided to stay on the safer side, and still do the conversion manually. This is the code I have ended up with (thanks to @CodesInChaos for pointers on how to do this):

    public static Fixed FromFloatSafe(float f) {
        // Extract float bits
        uint fb = BitConverter.ToUInt32(BitConverter.GetBytes(f), 0);
        uint sign = (uint)((int)fb >> 31);
        uint exponent = (fb >> 23) & 0xFF;
        uint mantissa = (fb & 0x007FFFFF);
    
        // Check for Infinity, SNaN, QNaN
        if (exponent == 255) {
            throw new ArgumentException();
        // Add mantissa's assumed leading 1
        } else if (exponent != 0) {
            mantissa |= 0x800000;
        }
    
        // Mantissa with adjusted sign
        int raw = (int)((mantissa ^ sign) - sign);
        // Required float's radix point shift to convert to fixed point
        int shift = (int)exponent - 127 - FRACTION_SHIFT + 1;
    
        // Do the shifting and check for overflows
        if (shift > 30) {
            throw new OverflowException();
        } else if (shift > 0) {
            long ul = (long)raw << shift;
            if (ul > int.MaxValue) {
                throw new OverflowException();
            }
            if (ul < int.MinValue) {
                throw new OverflowException();
            }
            raw = (int)ul;
        } else {
            raw = raw >> -shift;
        }
    
        return Fixed.FromRaw(raw);
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to convert from fixed point signed Q8 format to fixed point signed
I need a cross-platform library/algorithm that will convert between 32-bit and 16-bit floating point
I'm programming in C++. I need to convert a 24-bit signed integer (stored in
I need to convert a vdproj file to WiX format so that I can
I need to convert an unsigned 64-bit integer into a string. That is in
I need to convert both 32-bit and 64-bit unsigned integers into floating-point values in
I need to convert a string to 7-bit ASCII with even parity in a
I need to convert a Joda-Time DateTime into a String, in the following format:
I have a bunch of .eps files (CMYK) that I need to convert to
I need to convert a C/C++ double to a 64 bit two's complement, where

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.