Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 740763
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T08:33:13+00:00 2026-05-14T08:33:13+00:00

I’m working on a fluid dynamics Navier-Stokes solver that should run in real time.

  • 0

I’m working on a fluid dynamics Navier-Stokes solver that should run in real time. Hence, performance is important.

Right now, I’m looking at a number of tight loops that each account for a significant fraction of the execution time: there is no single bottleneck. Most of these loops do some floating-point arithmetic, but there’s a lot of branching in between.

The floating-point operations are mostly limited to additions, subtractions, multiplications, divisions and comparisons. All this is done using 32-bit floats. My target platform is x86 with at least SSE1 instructions. (I’ve verified in the assembler output that the compiler indeed generates SSE instructions.)

Most of the floating-point values that I’m working with have a reasonably small upper bound, and precision for near-zero values isn’t very important. So the thought occurred to me: maybe switching to fixed-point arithmetic could speed things up? I know the only way to be really sure is to measure it, that might take days, so I’d like to know the odds of success beforehand.

Fixed-point was all the rage back in the days of Doom, but I’m not sure where it stands anno 2010. Considering how much silicon is nowadays pumped into floating-point performance, is there a chance that fixed-point arithmetic will still give me a significant speed boost? Does anyone have any real-world experience that may apply to my situation?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T08:33:13+00:00Added an answer on May 14, 2026 at 8:33 am

    As other people have said, if you’re already using floating-point SIMD, I doubt you’ll get much improvement with fixed point.

    You said that the compiler is emitting SSE instructions, but it doesn’t sound like you’ve tried writing your vectorized SSE code. I don’t know how good the compilers usually are at that, but it’s something to investigate.

    Two other areas to look at are:

    1. Memory access – if all your computations are done in SSE, then cache misses might be taking up more time than the actual math.

      1. You can prefetch data with e.g. _mm_prefetch or __builtin_prefetch (depending on your compiler/platform).
      2. Check your expensive functions for aliasing between inputs and outputs; these can lead to extra memory reads/writes.
      3. Consider storing your data differently – if your fluid solver solvers for x coordinates independently of y’s, it might be more cache friendly to store them in different arrays. If they’re solved for together, consider interleaving (e.g. x y x y…)
    2. Unrolling – you should be able to get a performance benefit from unrolling your inner loops. The goal is not (as many people think) to reduce the number of loop termination checks. The main benefit is to allow independent instructions to be interleaved, to hide the instruction latency. There a presentation here entitled VMX Optimization: Taking it up a Level which might help a bit; it’s focused on Altivec instructions on Xbox360, but some of the unrolling advice might help on SSE as well.

    As other people have mentioned, profile, profile, profile. And then let us know what’s still slow 🙂

    PS – on one of your other posts here, I convinced you to use SOR instead of Gauss-Seidel in your matrix solver. Now that I think about it, is there a reason that you’re not using a tri-diagonal solver?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.