Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5941703
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T16:08:51+00:00 2026-05-22T16:08:51+00:00

I know that the default implementation of Lua uses floating point numbers only, thus

  • 0

I know that the default implementation of Lua uses floating point numbers only, thus circumventing the problem of dynamically determining the subtype of a number before choosing which variant of math function to use.

My question is — if I try to emulate integers as doubles (or floats) in standard C99, is there a reliable (and simple) way to tell what is the maximum value representable precisely?

I mean, if I use 64-bit floats to represent integers, I certainly cannot represent all 64-bit integers (the pigeonhole principle applies here). How can I tell the maximum integer that is representable?

(Trying to list all values is not a solution — if, for example, I’m using doubles in a 64-bit architecture, as I’d have to list 2^{64} numbers)

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T16:08:52+00:00Added an answer on May 22, 2026 at 4:08 pm

    The maximum ones-representable integer is 253 (9007199254740992) for a 64-bit double and 224 (16777216) for a 32-bit float. See the base digits on the Wikipedia page for IEEE floating point numbers.

    Verifying this in Lua is pretty simple:

    local maxdouble = 2^53
    
    -- one less than the maximum can be represented precisely
    print (string.format("%.0f",maxdouble-1)) --> 9007199254740991
    -- the maximum itself can be represented precisely
    print (string.format("%.0f",maxdouble))   --> 9007199254740992
    -- one more than the maximum gets rounded down
    print (string.format("%.0f",maxdouble+1)) --> 9007199254740992 again
    

    If we don’t have the IEEE-defined field sizes handy, knowing what we know about the design of floating point numbers, we can determine these values using a simple loop over the possible values:

    #include <stddef.h>
    #include <stdint.h>
    #include <stdio.h>
    #define min(a, b) (a < b ? a : b)
    #define bits(type) (sizeof(type) * 8)
    #define testimax(test_t) { \
      uintmax_t in = 1, out = 2; \
      size_t pow = 0, limit = min(bits(test_t), bits(uintmax_t)); \
      while (pow < limit && out == in + 1) { \
        in = in << 1; \
        out = (test_t) in + 1; \
        ++pow; \
      } \
      if (pow == limit) \
        puts(#test_t " is as precise as longest integer type"); \
      else printf(#test_t " conversion imprecise for 2^%d+1:\n" \
        "   in: %llu\n  out: %llu\n\n", pow, in + 1, out); \
    }
    
    int main(void)
    {
        testimax(float);
        testimax(double);
        return 0;
    }
    

    The output of the above code:

    float conversion imprecise for 2^24+1:
       in: 16777217
      out: 16777216
    
    double conversion imprecise for 2^53+1:
       in: 9007199254740993
      out: 9007199254740992
    

    Of course, due to the way floating-point precision works, a 64-bit double can represent numbers much larger than 264 as the floating exponent grows positive. The Wikipedia page on double-precision floating-point describes:

    Between 252=4,503,599,627,370,496 and 253=9,007,199,254,740,992 the representable numbers are exactly the integers. For the next range, from 253 to 254, everything is multiplied by 2, so the representable numbers are the even ones, etc. Conversely, for the previous range from 251 to 252, the spacing is 0.5, etc.

    The absolute largest value a double can hold is listed further down that page: 0x7fefffffffffffff, which computes to (1 + (1 − 2−52)) * 21023, or roughly 1.7976931348623157e308.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I know that default cron's behavior is to send normal and error output to
I know that codeIgniter turns off GET parameters by default. But by having everything
I know that component-model indicates whether a property has a default value or not,
I know that user accounts in Windows 7 are limited by default, so a
I know mongrel is the default server for script/server but when I do that
Does anybody know how to disable the default CKEditor behavior that changes the RichCombo
I know that form.reset() will reset all form fields to their default values, but
I know that I can do something like $int = (int)99; //(int) has a
I know that you can insert multiple rows at once, is there a way
I know that |DataDirectory| will resolve to App_Data in an ASP.NET application but is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.