I was playing with this option to optimize a for-loop in our embedded architecture (here). However, I noticed that when the alignment requires more than a single nop instruction to be added, then the compiler generates one nop followed by as-many-as-required zeros (0000).
I suspect it is a bug in our compiler, but can someone confirm it is not a bug in GCC?
Here’s a code snippet:
__asm__ volatile("nop");
__asm__ volatile("nop");
for (j0=0; j0<N; j0+=4)
{
c[j0+ 0] = a[j0+ 0] + b[j0+ 0];
c[j0+ 1] = a[j0+ 1] + b[j0+ 1];
c[j0+ 2] = a[j0+ 2] + b[j0+ 2];
c[j0+ 3] = a[j0+ 3] + b[j0+ 3];
}
Compile with -falign-loops=8 (or whatever number relevant to your architecture which is more than the required minimum alignment). You can add or remove the __asm__ lines as necessary to generate misaligned loop body.
Use
gcc -S -o foo.s foo.cto generate the assembly output without assembling it. I suspect you’ll see the.balignor.p2aligndirective in the asm. Assuming this directive is intended to work, I think it’s a bug in the assembler. It’s also possible that you’ve put the code in a non-default section (i.e. not.text) either intentionally or accidentally (e.g. with a misplaced.dataor.sectionin some other inline asm); normally the assembler pads with the proper size and number ofnopinstructions for sections that contain code, and 0 bytes for sections that contain data.