Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7508649
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T22:42:37+00:00 2026-05-29T22:42:37+00:00

I have a homework assignment to emulate floating point casts, e.g.: int y =

  • 0

I have a homework assignment to emulate floating point casts, e.g.:

int y = /* ... */;
float x = (float)(y);

. . . but obviously without using casting. That’s fine, and I wouldn’t have a problem, except I can’t find any specific, concrete definition of how exactly such casts are supposed to operate.

I have written an implementation that works fairly well, but it doesn’t quite match up occasionally (for example, it might put a value of three in the exponent and fill the mantissa with ones, but the "ground truth" will have a value of four in the exponent and fill the mantissa with zeroes). The fact that the two are equivalent (sorta, by infinite series) is frustrating because the bit pattern is still "wrong".

Sure, I get vague things, like "round toward zero" from scattered websites, but honestly my searches keep getting clogged C newbie questions (e.g., "What’s a cast?", "When do I use it?"). So, I can’t find a general rule that works for explicitly defining the exponent and the mantissa.

Help?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T22:42:37+00:00Added an answer on May 29, 2026 at 10:42 pm

    Since this is homework, I’ll just post some notes about what I think is the tricky part – rounding when the magnitude of the integer is larger than the precision of the float will hold. It sounds like you already have a solution for the basics of obtaining the exponent and mantissa already.

    I’ll assume that your float representation is IEEE 754, and that rounding is performed the same way that MSVC and MinGW do: using a “banker’s rounding” scheme (I’m honestly not sure if that particular rounding scheme is required by the standard; it’s what I tested against though). The remaining discussion assumes the int to be converted in greater than 0. Negative numbers can be handled by dealing with their absolute value and setting the sign bit at the end. Of course, 0 needs to be handled specially in any case (because there’s no msb to find).

    Since there are 24 bits of precision in the mantissa (including the implied 1 for the msb), ints up to 16777215 (or 0x00ffffff) can be represented exactly. There’s nothing particularly special to do other than the bit shifting to get things in the right place and calculating the correct exponent depending on the shifts.

    However, if there are more than 24 bits of precision in the int value, you’ll need to round. I performed the rounding using these steps:

    • If the msb of the dropped bits is 0, nothing more needs to be done. The mantissa and exponent can be left alone.
    • if the msb of the dropped bits is 1, and the remaining dropped bits have one or more bits set, the mantissa needs to be incremented. If the mantissa overflows (beyond 24 bits, assuming you haven’t already dropped the implied msb), then the mantissa needs to be shifted right, and the exponent incremented.
    • if the msb of the dropped bits is one, and the remaining dropped bits are all 0, then the mantissa is incremented only if the lsb is 1. Handle overflow of the mantissa similarly to case 2.

    Since the mantissa increment will overflow only when it’s all 1‘s, if you’re not carrying around the mantissa’s msb (i.e., if you’ve already dropped it since it’ll be dropped in the ultimate float representation), then the cases where the mantissa increment overflows can be fixed up simply by setting the mantissa to zero and incrementing the exponent.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

For a homework assignment in linear algebra, I have solved the following equation using
Im almost done with a homework assignment that multiplies polynomials and has to have
Continuing on this problem , but I'll reiterate: For a homework assignment I have
I have a homework assignment that asks to create an order form. The order
I'm still not very good with data structures, but I have this homework assignment
I'm very new to XML and XSLT. I have a homework assignment that asks
For my homework assignment, I have a network of Nodes that are passing messages
I have a homework assignment where I need to take input from a file
(this is indirectly a part of a much larger homework assignment) I have something
I have three questions regarding a homework assignment for C++. The goal was to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.