Refering to @auselen’s answer here: Using ARM NEON intrinsics to add alpha and permute, looks like armcc compiler is far more better than the gcc compiler for NEON optimizations. Is this really true? I haven’t really tried armcc compiler. But I got pretty optimized code using the gcc compiler with -O3 optimization flag. But now I’m wondering if armcc is really that good? So which of the two compiler is better, considering all the factors?
Refering to @auselen’s answer here: Using ARM NEON intrinsics to add alpha and permute
Share
Compilers are software as well, they tend to improve over time. Any generic claim like armcc is better than GCC on NEON (or better said as vectorization) can’t hold true forever since one developer group can close the gap with enough attention. However initially it is logical to expect compilers developed by hardware companies to be superior because they need to demonstrate/market these features.
One recent example I saw was here on Stack Overflow about an answer for branch prediction. Quoting from last line of updated section “This goes to show that even mature modern compilers can vary wildly in their ability to optimize code…”.
I am a big fan of GCC, but I wouldn’t bet on quality of code produced by it against compilers from Intel or ARM. I expect any mainstream commercial compiler to produce code at least as good as GCC.
One empirical answer to this question could be to use hilbert-space’s neon optimization example and see how different compilers optimize it.
This is armcc 5.01
This is GCC 4.4.3-4.7.1
Which looks extremely similar, so we have a draw. After seeing this I tried mentioned add alpha and permute again.
Compiling with gcc…
Compiling with armcc…
In this case armcc produces much better code. I think this justifies fgp’s answer above. Most of the time GCC will produce good enough code, but you should keep an eye on critical parts or most importantly first you must measure / profile.