I was wondering how ARM floating point performance on smartphones is compared to x86. For this purpose i wrote the following code:
#include "Linderdaum.h"
sEnvironment* Env = NULL;
volatile float af = 1.0f;
volatile float bf = 1.0f;
volatile int a = 1;
volatile int b = 1;
APPLICATION_ENTRY_POINT
{
Env = new sEnvironment();
Env->DeployDefaultEnvironment( "", "CommonMedia" );
double Start = Env->GetSeconds();
float Sum1 = 0.0f;
for ( int i = 0; i != 200000000; i++ ) { Sum1 += af + bf; }
double End = Env->GetSeconds();
Env->Logger->Log( L_DEBUG, LStr::ToStr( Sum1, 4 ) );
Env->Logger->Log( L_DEBUG, "Float: " + LStr::ToStr( End-Start, 5 ) );
Start = Env->GetSeconds();
int Sum2 = 0;
for ( int i = 0; i != 200000000; i++ ) { Sum2 += a + b; }
End = Env->GetSeconds();
Env->Logger->Log( L_DEBUG, LStr::ToStr( Sum2, 4 ) );
Env->Logger->Log( L_DEBUG, "Int: " + LStr::ToStr( End-Start, 5 ) );
Env->RequestExit();
APPLICATION_EXIT_POINT( Env );
}
APPLICATION_SHUTDOWN
{}
Here are the results for different targets and compilers.
1. Windows PC on Core i7 920.
VS 2008, debug build, Win32/x86
(Main):01:30:11.769 Float: 0.72119
(Main):01:30:12.347 Int: 0.57875
float is slower than int.
VS 2008, debug build, Win64/x86-64
(Main):01:43:39.468 Float: 0.72247
(Main):01:43:40.040 Int: 0.57212
VS 2008, release build, Win64/x86-64
(Main):01:39:25.844 Float: 0.21671
(Main):01:39:26.060 Int: 0.21511
VS 2008, release build, Win32/x86
(Main):01:33:27.603 Float: 0.70670
(Main):01:33:27.814 Int: 0.21130
int is gaining the lead.
2. Samsung Galaxy S smartphone.
GCC 4.3.4, armeabi-v7a, -mfpu=vfp -mfloat-abi=softfp -O3
01-27 01:31:01.171 I/LEngine (15364): (Main):01:31:01.177 Float: 6.47994
01-27 01:31:02.257 I/LEngine (15364): (Main):01:31:02.262 Int: 1.08442
float is seriously slower than int.
Let’s now change addition to multiplication inside the loops:
float Sum1 = 2.0f;
for ( int i = 0; i != 200000000; i++ )
{
Sum1 *= af * bf;
}
...
int Sum2 = 2;
for ( int i = 0; i != 200000000; i++ )
{
Sum2 *= a * b;
}
VS 2008, debug build, Win32/x86
(Main):02:00:39.977 Float: 0.87484
(Main):02:00:40.559 Int: 0.58221
VS 2008, debug build, Win64/x86-64
(Main):01:59:27.175 Float: 0.77970
(Main):01:59:27.739 Int: 0.56328
VS 2008, release build, Win32/x86
(Main):02:05:10.413 Float: 0.86724
(Main):02:05:10.631 Int: 0.21741
VS 2008, release build, Win64/x86-64
(Main):02:09:58.355 Float: 0.29311
(Main):02:09:58.571 Int: 0.21595
GCC 4.3.4, armeabi-v7a, -mfpu=vfp -mfloat-abi=softfp -O3
01-27 02:02:20.152 I/LEngine (15809): (Main):02:02:20.156 Float: 6.97402
01-27 02:02:22.765 I/LEngine (15809): (Main):02:02:22.769 Int: 2.61264
The question is: what am i missing (any compiler options)? Is the floating point math really slower (compared to int) on ARM devices?
see http://github.com/dwelch67/stm32f4d see the float03 directory
The test compares these two functions fixed vs float
The results are not too surprising, the 0x4E2C time is fixed point and 0x4E2E is float, there are a few extra instructions in the float test function that likely account for the difference:
The fpu in the stm32f4 is a limited to single precision version of the vfp found in its big brothers and sisters. You should be able to perform the above test on any armv7 with vfp hardware.
By having the __aeabi_fadd function linked in and that extra call made each time through the loop, plus the additional timing of memory accesses, possibly conversions outside or inside (vmov) the library function, etc, can add to what you are seeing. The answer of course is in the disassembly.