I have a simple (but performance critical) algorithm in C (embedded in C++) to

Question

0

Asked: May 26, 20262026-05-26T12:52:22+00:00 2026-05-26T12:52:22+00:00

I have a simple (but performance critical) algorithm in C (embedded in C++) to

0

I have a simple (but performance critical) algorithm in C (embedded in C++) to manipulate a data buffer… the algorithm ‘naturally’ uses 64-bit big-endian register values – and I’d like to optimise this using assembler to gain direct access to the carry flag and BSWAP and, hence, avoid having to manipulate the 64-bit values one byte at a time.

I want the solution to be portable between OS/Compilers – minimally supporting GNU g++ and Visual C++ – and between Linux and Windows respectively. For both platforms, obviously, I’m assuming a processor that supports the x86-64 instruction set.

I’ve found this document about inline assembler for MSVC/Windows, and several fragments via Google detailing an incompatible syntax for g++. I accept that I might need to implement this functionality separately in each dialect. I’ve not been able to find sufficiently detailed documentation on syntax/facilities to tackle this development.

What I’m looking for is clear documentation detailing the facilities available to me – both with MS and GNU tool sets. While I wrote some 32-bit assembler many years ago, I’m rusty – I’d benefit from a concise document detailing facilities are available at an assembly level.

A further complication is that I’d like to compile for windows using the Visual C++ Express Edition 2010… I recognise that this is a 32-bit compiler – but, I wondered, is it possible to embed 64-bit assembly into its executables? I only care about 64-bit performance in the section I plan to hand-code.

Can anyone offer any pointers (please pardon the pun…)?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T12:52:23+00:00

Just to give you a taste of the obstacles that lie in your path, here is a simple inline assembler function, in two dialects. First, the Borland C++ Builder version (I think this compiles under MSVC++ too):

int BNASM_AddScalar (DWORD* result, DWORD x)
  {
  int carry = 0 ;
  __asm
    {
    mov     ebx,result
    xor     eax,eax
    mov     ecx,x
    add     [ebx],ecx
    adc     carry,eax    // Return the carry flag
    }
  return carry ;
  }

Now, the g++ version:

int BNASM_AddScalar (DWORD* result, DWORD x)
  {
  int carry = 0 ;
  asm volatile (
"    addl    %%ecx,(%%edx)\n"
"    adcl    $0,%%eax\n"    // Return the carry flag
: "+a"(carry)         // Output (and input): carry in eax
: "d"(result), "c"(x) // Input: result in edx and x in ecx
) ;
  return carry ;
  }

As you can see, the differences are major. And there is no way around them. These are from a large integer arithmetic library that I wrote for a 32-bit environment.

As for embedding 64-bit instructions in a 32-bit executable, I think this is forbidden. As I understand it, a 32-bit executable runs in 32-bit mode, any 64-bit instruction just generates a trap.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a simple (but performance critical) algorithm in C (embedded in C++) to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply