Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8570995
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T18:41:09+00:00 2026-06-11T18:41:09+00:00

i had a problem where i was trying to reconstruct the the formula used

  • 0

i had a problem where i was trying to reconstruct the the formula used in an existing system, a fairly simple formula of one input and one output:

y = f(x)

After a lot of puzzling, we managed to figure out the formula that fit our observed data points:

enter image description here

And as you can see our theoretical model fit observed data very well:

enter image description here

Except when we plot residual errors (i.e. y = f(x) - actualY), we see some lines appear in the residuals:

enter image description here

It was obvious that these lines were the result of applying some intermediate rounding in our formula, but it was not obvious where. Eventually it was realized that the original system (the one we’re trying to reverse engineer) is storing values in an intermediate Decimal data type:

  • with 8-bit precision of the fraction
  • using the 0.5 round-up rounding model:

We could simulate this 8-bit precision in the fraction by:

multiply by 128 (i.e. 2^8)
apply the round
divide by 128 (i.e. 2^8)

Changing our equation above into:

enter image description here

This reduces the residual errors significantly:

enter image description here

Now, all of that above has no relevance to my question except:

  1. To show that simulating the numerical representation in the computer can help the model
  2. To get people’s attention with pretty pictures and colors
  3. Silence critics who would refuse to contribute until i explain why i’m asking my question

Now i want to simulate Single Precision floating point numbers, inside a programming language (and Excel) which use Double Precision floating point numbers. i want to do this because i think it is what’s needed.

In the above example i thought the original system was using a Decimal data type with fixed 8-bit fractional precision using 0.5 round-up rules. i then had to find a way to simulate that computation model with Double math. Now i think the original system is using Single precision math, that i want to simulate using Double.

How do i simulate single-precision rounding using doubles?

In my current model, i once again have residuals that fall into the regular linear patterns – that are a tell-tale sign of rounding:

enter image description here

The problem is that the error becomes larger, and only visible, as my input variables become larger. i realized this is likely caused by the fact that all floating point numbers are normalized into IEEE 754 “scientific notation”.

And even if i’m wrong, i still want to try it.

And even if i don’t want to trying it, i’m still asking the question

How do i simulate Single precision rounding using Doubles?


It seems to me i could still apply the concept of “rounding after 8 fractional bits” (although 24 bits for Single precision floating point), as long as i can first “normalize” the value. e.g.

1234567898.76543

needs to be converted into (something similar to):

1.23456789876543 E-09

Then i could apply my “round to the 24th bit” (i.e. 2^24 = 16,777,216)

floor(1.23456789876543E-09 * 16777216 + 0.5) / 16777216;

The problem, then, is what combination of sign, abs, ln, exp (or other functions) can i possible apply so that i can “normalize” my value, round it to the n-th binary place, then “denormalize” it?

Note: i realize IEEE representation keeps a binary 1 as the most significant bit. i might not need to duplicate that behavior in order to get correct results. So it’s not a deal-breaker, nor is it cause to suggest that the entire approach is a failure.

See also

  • How to emulate single precision float operations in PHP?
  • Simulate single precision arithmetic in Matlab?
  • IEEE 754 Floating Point
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T18:41:10+00:00Added an answer on June 11, 2026 at 6:41 pm

    You want to use the library functions frexp and ldexp, which are standard C99 functions, and are available in Lua.

    frexp takes a floating point number and separates the mantissa from the exponent. The resulting mantissa is either 0 or in one of the ranges [0.5, 1.0) or (-1.0, 0.5]. You can then remove any extra bits in the obvious way (floor(mantissa * 2^k)/2^k for non-negative values, for example). (Edited to add:) It would be better to subtract k from the exponent in the call to ldexp than to do the divide as shown, because I’m pretty sure that Lua doesn’t guarantee that 2^k is precise.

    ldexp is the inverse of frexp; you can use that to put the truncated number back together again.

    I have no idea how to do this in Excel. Check the manual 🙂 (Edited to add:) I suppose you could get roughly the same effect by dividing the number by 2 to the power of the ceiling of the log 2 of the number, and then doing the binary round as indicated above, and then reversing the process to recreate the original exponent. But I suspect the results would occasionally run into peculiarities with Excel’s peculiar ideas about arithmetic.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Seems like just yesterday I had this same problem with Play! v1. After trying
I've had to ask this one again, sorry, but I'm having a problem trying
Has anyone had a problem running Clojure Box in Windows 7? I am trying
I had a huge problem and spend hours trying to make it work, but
We had a terrible problem/experience yesterday when trying to swap our staging <--> production
I've been googling and trying for days to solve my problem I had n
I've had a bit of a problem trying to bind a generic list to
i'm trying to learn Grails console and had a problem. It looks like this:
I have had this problem for a while, still trying to work on a
I had a problem when trying to connect and upload a file to an

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.