Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6347467
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T21:13:14+00:00 2026-05-24T21:13:14+00:00

So apparently on my machine, float, double and long double each have different sizes

  • 0

So apparently on my machine, float, double and long double each have different sizes each. There also doesn’t seem to be a strict standard enforcing how many bytes each of those types would have to be.

How would one, then, save a floating point value into a binary file, and then have it read properly on a different system if the sizes differ? e.g my machine has 8 byte doubles, whereas joe’s have 12 byte doubles.

Without having to export it in text form (e.g “0.3232”), that is. Seems a lot less compact than the binary representation.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T21:13:15+00:00Added an answer on May 24, 2026 at 9:13 pm

    You have to define a format, and implement that. Typically, most of the
    network protocols I know use IEEE float and double, output big-endian
    (but other formats are possible). The advantage of using IEEE formats
    is that it is what most of the current everyday machines use
    internally; if you’re on one of these machines (and portability of your
    code to other machines, like mainframes, isn’t an issue), you can
    “convert” to the format simply by type-punning to an unsigned int of the
    same size, and outputting that. So, for example, you might have:

    obstream&
    operator<<( obstream& dest, uint64_t value )
    {
        dest.put((value >> 56) & 0xFF);
        dest.put((value >> 48) & 0xFF);
        dest.put((value >> 40) & 0xFF);
        dest.put((value >> 32) & 0xFF);
        dest.put((value >> 24) & 0xFF);
        dest.put((value >> 16) & 0xFF);
        dest.put((value >>  8) & 0xFF);
        dest.put((value      ) & 0xFF);
        return dest;
    }
    
    obstream&
    operator<<( obstream& dest, double value )
    {
        return dest << reinterpret_cast<uint64_t const&>( value );
    }
    

    If you have to be portable to a machine not supporting IEEE (e.g. any of
    the modern mainframes), you’ll need something a bit more complicated:

    obstream&
    obstream::operator<<( obstream& dest, double value )
    {
        bool                isNeg = value < 0;
        if ( isNeg ) {
            value = - value;
        }
        int                 exp;
        if ( value == 0.0 ) {
            exp = 0;
        } else {
            value = ldexp( frexp( value, &exp ), 53 );
            exp += 1022;
        }
        uint64_t mant = static_cast< uint64_t >( value );
        dest.put( (isNeg ? 0x80 : 0x00) | exp >> 4 );
        dest.put( ((exp << 4) & 0xF0) | ((mant >> 48) & 0x0F) );
        dest.put( mant >> 40 );
        dest.put( mant >> 32 );
        dest.put( mant >> 24 );
        dest.put( mant >> 16 );
        dest.put( mant >>  8 );
        dest.put( mant       );
        return dest;
    }
    

    (Note that this doesn’t handle NaN’s and infinities correctly.
    Personally, I would ban them from the format, since not all floating
    point representations support them. But then, there’s no floating point
    format on an IBM mainframe which will support 1E306, either, although
    you can encode it in the IEEE double format above.)

    Reading is, of course, the opposite. Either:

    ibstream&
    operator>>( ibstream& source, uint64_t& results )
    {
        uint64_t value = (source.get() & 0xFF) << 56;
        value |= (source.get() & 0xFF) << 48;
        value |= (source.get() & 0xFF) << 40;
        value |= (source.get() & 0xFF) << 32;
        value |= (source.get() & 0xFF) << 24;
        value |= (source.get() & 0xFF) << 16;
        value |= (source.get() & 0xFF) <<  8;
        value |= (source.get() & 0xFF)      ;
        if ( source )
            results = value;
        return source;
    }
    
    ibstream&
    operator>>( ibstream& source, double& results)
    {
        uint64_t tmp;
        source >> tmp;
        if ( source )
            results = reinterpret_cast<double const&>( tmp );
        return source;
    }
    

    or if you can’t count on IEEE:

    ibstream&
    ibstream::operator>>( ibstream& source, double& results )
    {
        uint64_t tmp;
        source >> tmp;
        if ( source ) {
            double f = 0.0;
            if ( (tmp & 0x7FFFFFFFFFFFFFFF) != 0 ) {
                f = ldexp( ((tmp & 0x000FFFFFFFFFFFFF) | 0x0010000000000000),
                           static_cast<int>( (tmp & 0x7FF0000000000000) >> 52 )
                                    - 1022 - 53 );
            }
            if ( (tmp & 0x8000000000000000) != 0 ) {
                f = -f;
            }
            dest = f;
        }
        return source;
    }
    

    (This assumes that the input is not an NaN or an infinity.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Apparently I can't move files on different volumes using Directory.Move. I have read that
Apparently xrange is faster but I have no idea why it's faster (and no
Apparently there's a lot of variety in opinions out there, ranging from, " Never!
NOTE: Apparently, the reason the MySQL connector installation doesn't show up automatically in the
I have SQL Server 2008 and the Management Studio installed on my machine, but
I have a c# Windows Forms application, using .NET 3.5. My machine environment is
Since there is apparently no Flash control that can accept bitmap pastes , I
I have a maven project which is running fine on my machine but not
I have remote logged into my machine and trying to start tomcat server. But,
I have a repository on my local machine that is under SVN. For various

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.