Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9066657
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T16:46:32+00:00 2026-06-16T16:46:32+00:00

Short version: how can I detect overflow using the fixed-point multiplication described here but

  • 0

Short version: how can I detect overflow using the fixed-point multiplication described here but for a signed type?

Long version:

I still have some overflow issues with my Q31.32 fixed point type. To make it easier to work out examples on paper, I’ve made a much smaller type using the same algorithm, a Q3.4 based on sbyte. I figure that if I can work out all the kinks for a Q3.4 type, the same logic should apply for a Q31.32 one.

Note that I could very easily implement Q3.4 multiplication by performing it on a 16-bit integer, but I’m doing as if that didn’t exist, because for the Q31.32 I’d need a 128-bit integer which doesn’t exist (and BigInteger is too slow).

I want my multiplication to handle overflow by saturation, that is when overflow happens, the result is the highest or smallest value that can be represented depending on the sign of the operands.

This is basically how the type is represented:

struct Fix8 {
    sbyte m_rawValue;
    public static readonly Fix8 One = new Fix8(1 << 4);
    public static readonly Fix8 MinValue = new Fix8(sbyte.MinValue);
    public static readonly Fix8 MaxValue = new Fix8(sbyte.MaxValue);

    Fix8(sbyte value) {
        m_rawValue = value;
    }

    public static explicit operator decimal(Fix8 value) {
        return (decimal)value.m_rawValue / One.m_rawValue;
    }

    public static explicit operator Fix8(decimal value) {
        var nearestExact = Math.Round(value * 16m) * 0.0625m;
        return new Fix8((sbyte)(nearestExact * One.m_rawValue));
    }
}

And this is how I currently handle multiplication:

    public static Fix8 operator *(Fix8 x, Fix8 y) {
        sbyte xl = x.m_rawValue;
        sbyte yl = y.m_rawValue;

        // split x and y into their highest and lowest 4 bits
        byte xlo = (byte)(xl & 0x0F);
        sbyte xhi = (sbyte)(xl >> 4);
        byte ylo = (byte)(yl & 0x0F);
        sbyte yhi = (sbyte)(yl >> 4);

        // perform cross-multiplications
        byte lolo = (byte)(xlo * ylo);
        sbyte lohi = (sbyte)((sbyte)xlo * yhi);
        sbyte hilo = (sbyte)(xhi * (sbyte)ylo);
        sbyte hihi = (sbyte)(xhi * yhi);

        // shift results as appropriate
        byte loResult = (byte)(lolo >> 4);
        sbyte midResult1 = lohi;
        sbyte midResult2 = hilo;
        sbyte hiResult = (sbyte)(hihi << 4);

        // add everything
        sbyte sum = (sbyte)((sbyte)loResult + midResult1 + midResult2 + hiResult);

        // if the top 4 bits of hihi (unused in the result) are neither all 0s or 1s,
        // then this means the result overflowed.
        sbyte topCarry = (sbyte)(hihi >> 4);
        bool opSignsEqual = ((xl ^ yl) & sbyte.MinValue) == 0;
        if (topCarry != 0 && topCarry != -1) {
            return opSignsEqual ? MaxValue : MinValue;
        }

        // if signs of operands are equal and sign of result is negative,
        // then multiplication overflowed upwards
        // the reverse is also true
        if (opSignsEqual) {
            if (sum < 0) {
                return MaxValue;
            }
        }
        else {
            if (sum > 0) {
                return MinValue;
            }
        }

        return new Fix8(sum);
    }

This gives result accurate within the precision of the type and handles most overflow cases. It doesn’t however handle these ones, for example:

Failed -8 * 2 : expected -8 but got 0
Failed 3.5 * 5 : expected 7,9375 but got 1,5

Let’s work out how the multiplication happens for the first one.

-8 and 2 are represented as x = 0x80 and y = 0x20.
xlo = 0x80 & 0x0F = 0x00
xhi = 0x80 >> 4 = 0xf8
ylo = 0x20 & 0x0F = 0x00
yhi = 0x20 >> 4 = 0x02

lolo = xlo * ylo = 0x00
lohi = xlo * yhi = 0x00
hilo = xhi * ylo = 0x00
hihi = xhi * yhi = 0xf0

The sum is obviously 0 as all terms are 0 save for hihi, but only the lowest 4 bits of hihi are used in the final sum.

My usual overflow detection magic doesn’t work here: the result is zero so the sign of the result is meaningless (e.g. 0.0625 * -0.0625 == 0 (by rounding down), 0 is positive yet signs of operands differ); also the high bits of hihi are 1111 which often happens even when there’s no overflow.

Basically I don’t know how to detect that overflow happened here. Is there a more general method?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T16:46:34+00:00Added an answer on June 16, 2026 at 4:46 pm

    This took me a long time, but I eventually figured everything out. This code is tested to work for every possible combination of x and y in the range allowed by sbyte. Here is the commented code:

        static sbyte AddOverflowHelper(sbyte x, sbyte y, ref bool overflow) {
            var sum = (sbyte)(x + y);
            // x + y overflows if sign(x) ^ sign(y) != sign(sum)
            overflow |= ((x ^ y ^ sum) & sbyte.MinValue) != 0;
            return sum;
        }
    
        /// <summary>
        /// Multiplies two Fix8 numbers.
        /// Deals with overflow by saturation.
        /// </summary>
        public static Fix8 operator *(Fix8 x, Fix8 y) {
            // Using the cross-multiplication algorithm, for learning purposes.
            // It would be both trivial and much faster to use an Int16, but this technique
            // won't work for a Fix64, since there's no Int128 or equivalent (and BigInteger is too slow).
    
            sbyte xl = x.m_rawValue;
            sbyte yl = y.m_rawValue;
    
            byte xlo = (byte)(xl & 0x0F);
            sbyte xhi = (sbyte)(xl >> 4);
            byte ylo = (byte)(yl & 0x0F);
            sbyte yhi = (sbyte)(yl >> 4);
    
            byte lolo = (byte)(xlo * ylo);
            sbyte lohi = (sbyte)((sbyte)xlo * yhi);
            sbyte hilo = (sbyte)(xhi * (sbyte)ylo);
            sbyte hihi = (sbyte)(xhi * yhi);
    
            byte loResult = (byte)(lolo >> 4);
            sbyte midResult1 = lohi;
            sbyte midResult2 = hilo;
            sbyte hiResult = (sbyte)(hihi << 4);
    
            bool overflow = false;
            // Check for overflow at each step of the sum, if it happens overflow will be true
            sbyte sum = AddOverflowHelper((sbyte)loResult, midResult1, ref overflow);
            sum = AddOverflowHelper(sum, midResult2, ref overflow);
            sum = AddOverflowHelper(sum, hiResult, ref overflow);
    
            bool opSignsEqual = ((xl ^ yl) & sbyte.MinValue) == 0;
    
            // if signs of operands are equal and sign of result is negative,
            // then multiplication overflowed positively
            // the reverse is also true
            if (opSignsEqual) {
                if (sum < 0 || (overflow && xl > 0)) {
                    return MaxValue;
                }
            }
            else {
                if (sum > 0) {
                    return MinValue;
                }
                // If signs differ, both operands' magnitudes are greater than 1,
                // and the result is greater than the negative operand, then there was negative overflow.
                sbyte posOp, negOp;
                if (xl > yl) {
                    posOp = xl;
                    negOp = yl;
                }
                else {
                    posOp = yl;
                    negOp = xl;
                }
                if (sum > negOp && negOp < -(1 << 4) && posOp > (1 << 4)) {
                    return MinValue;
                }
            }
    
            // if the top 4 bits of hihi (unused in the result) are neither all 0s nor 1s,
            // then this means the result overflowed.
            sbyte topCarry = (sbyte)(hihi >> 4);
            // -17 (-1.0625) is a problematic value which never causes overflow but messes up the carry bits
            if (topCarry != 0 && topCarry != -1 && xl != -17 && yl != -17) {
                return opSignsEqual ? MaxValue : MinValue;
            }
    
            // Round up if necessary, but don't overflow
            var lowCarry = (byte)(lolo << 4);
            if (lowCarry >= 0x80 && sum < sbyte.MaxValue) {
                ++sum;
            }
    
            return new Fix8(sum);
        }
    

    I’m putting all this together into a properly unit tested fixed-point math library for .NET, which will be available here: https://github.com/asik/FixedMath.Net

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Short version: Can I grant access to external databases to a role? Long version:
short version: how can I get rid of the multiple-versions-of-python nightmare ? long version:
Short version: How can I (using a JQuery or other selector) select an HTML
Short version: How can I get a regex that matches a@a.aaaa but not a@a.aaaaa
Short version of my questions: Can anyone point me toward some good, detailed sources
Short version : I have a working solution using a UISegmentedControl but I don't
Short version: How can I make sure that my component GUIDs remain stable using
Short version: How can I map two columns from table A and B if
SHORT VERSION OF QUESTION: So basically my question is: How can I set the
The short version of the question - why can't I do this? I'm restricted

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.