Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9189965
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T20:21:14+00:00 2026-06-17T20:21:14+00:00

I’m working on a processor without a floating point unit so I have to

  • 0

I’m working on a processor without a floating point unit so I have to use fixed or a custom floating point type for a user interface.

What does the performance on say a multiply look like for these three types:

  1. IEEE Float (32)
  2. Custom 32 bit float class with a 16 bit signed value and a signed 16 bit exponent
  3. 32-bit fixed decimal

I want something that will scale to a processor with a floating point unit as well, will the custom float be competitive performance-wise with an IEEE float? I’ve heard the performance of IEEE floats are terrible on processors without FPUs, is that because it has to do crazy and/oring due to the 24-bit value not being native? That is, will the custom float class mitigate that performance problem?

Any help would be greatly appreciated!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T20:21:15+00:00Added an answer on June 17, 2026 at 8:21 pm

    Software-emulated IEEE floats/doubles are slow because of many edge cases one needs to check for and properly handle.

    • +/-infinity in input
    • Not-A-Number in input
    • +/-0 in input
    • normalized vs denormalized number in input and the implicit ‘1’ in the mantissa
    • unpacking and packing
    • normalization/denormalization
    • under- and overflow checks
    • correct rounding, which can lead to extra (de)normalization and/or underflow/overflow

    If you just roughly count the above as a number of primitive micro operations (1 for each item on the list), you get close to 10. There will be many more in the worst case.

    So, if you’re interested in IEEE-compilant floating point arithmetic, expect every emulated operation to be something like 30x slower than its integer counterpart (CodesInChaos’s comment is timely with the 38 clocks per addition/multiplication).

    You could cut some corners by choosing a floating-point format with:

    • just one zero
    • no Not-A-Number
    • normalized numbers only
    • no implicit ‘1’ in the mantissa
    • exponent and mantissa each occupying an integral number of bytes
    • no or primitive rounding
    • possibly, no infinities
    • possibly, 2’s complement mantissa
    • possibly, no exponent bias

    Fixed-point arithmetic may turn out much more performant. But the usual problem with it is that you have to know all the ranges of the inputs and intermediate results beforehand so you can choose the right format in order to avoid overflows. You’ll also likely need a number of different fixed-point formats supported, e.g. 16.16, 32.32, 8.24, 0.32. C++ templates may help reduce code duplication here.

    In any event, the best you can do is define your problem, solve it with both floating and fixed point arithmetic, observe which of the two is the best for which CPU and choose the winner.

    EDIT: For an example of a simpler floating-point format, take a look at the MIL-STD-1750A’s 32-bit floating point format:

     MSB                                         LSB MSB          LSB
    ------------------------------------------------------------------
    | S|                   Mantissa                 |    Exponent    |
    ------------------------------------------------------------------
      0  1                                        23 24            31
    

    Floating point numbers are represented as a fractional mantissa times 2 raised to the power of the exponent. All floating point numbers are assumed normalized or floating point zero at the beginning of a floating point operation and the results of all floating point operations are normalized (a normalized floating point number has the sign of the mantissa and the next bit of opposite value) or floating point zero. A floating point zero is defined as 0000 000016, that is, a zero mantissa and a zero exponent (0016). An extended floating point zero is defined as 0000 0000 000016, that is, a zero mantissa and a zero exponent. Some examples of the machine representation for 32-bit floating point numbers:

    Decimal Number  Hexadecimal Notation  
    (Mantissa x Exp)  
    0.9999998 x 2127     7FFFFF 7F  
    0.5 x 2127   400000 7F  
    0.625 x 24   500000 04  
    0.5 x 21     400000 01  
    0.5 x 20     400000 00  
    0.5 x 2-1    400000 FF  
    0.5 x 2-128  400000 80  
    0.0 x 20     000000 00  
    -1.0 x 20    800000 00  
    -0.5000001 x 2-128   BFFFFF 80  
    -0.7500001 x 24  9FFFFF 04  
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have just tried to save a simple *.rtf file with some websites and
I am confused How to use looping for Json response Array in another Array.
I am trying to understand how to use SyndicationItem to display feed which is
this is what i have right now Drawing an RSS feed into the php,
I have a small JavaScript validation script that validates inputs based on Regex. I
I want use html5's new tag to play a wav file (currently only supported
I have this code to decode numeric html entities to the UTF8 equivalent character.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.