I was wondering how ARM floating point performance on smartphones is compared to x86.

Question

0

Asked: June 11, 20262026-06-11T05:14:54+00:00 2026-06-11T05:14:54+00:00

I was wondering how ARM floating point performance on smartphones is compared to x86.

0

I was wondering how ARM floating point performance on smartphones is compared to x86. For this purpose i wrote the following code:

#include "Linderdaum.h"
sEnvironment* Env = NULL;

volatile float af = 1.0f;
volatile float bf = 1.0f;
volatile int a = 1;
volatile int b = 1;

APPLICATION_ENTRY_POINT
{
    Env = new sEnvironment();

    Env->DeployDefaultEnvironment( "", "CommonMedia" );

    double Start = Env->GetSeconds();

    float Sum1 = 0.0f;

    for ( int i = 0; i != 200000000; i++ )    {        Sum1 += af + bf;    }

    double End = Env->GetSeconds();

    Env->Logger->Log( L_DEBUG, LStr::ToStr( Sum1, 4 ) );
    Env->Logger->Log( L_DEBUG, "Float: " + LStr::ToStr( End-Start, 5 ) );

    Start = Env->GetSeconds();

    int Sum2 = 0;

    for ( int i = 0; i != 200000000; i++ )    {       Sum2 += a + b;    }

    End = Env->GetSeconds();

    Env->Logger->Log( L_DEBUG, LStr::ToStr( Sum2, 4 ) );
    Env->Logger->Log( L_DEBUG, "Int: " + LStr::ToStr( End-Start, 5 ) );

    Env->RequestExit();

    APPLICATION_EXIT_POINT( Env );
}

APPLICATION_SHUTDOWN
{}

Here are the results for different targets and compilers.

1. Windows PC on Core i7 920.

VS 2008, debug build, Win32/x86

(Main):01:30:11.769   Float: 0.72119
(Main):01:30:12.347   Int: 0.57875

float is slower than int.

VS 2008, debug build, Win64/x86-64

(Main):01:43:39.468   Float: 0.72247
(Main):01:43:40.040   Int: 0.57212

VS 2008, release build, Win64/x86-64

(Main):01:39:25.844   Float: 0.21671
(Main):01:39:26.060   Int: 0.21511

VS 2008, release build, Win32/x86

(Main):01:33:27.603   Float: 0.70670
(Main):01:33:27.814   Int: 0.21130

int is gaining the lead.

2. Samsung Galaxy S smartphone.

GCC 4.3.4, armeabi-v7a, -mfpu=vfp -mfloat-abi=softfp -O3

01-27 01:31:01.171 I/LEngine (15364): (Main):01:31:01.177   Float: 6.47994
01-27 01:31:02.257 I/LEngine (15364): (Main):01:31:02.262   Int: 1.08442

float is seriously slower than int.

Let’s now change addition to multiplication inside the loops:

float Sum1 = 2.0f;

for ( int i = 0; i != 200000000; i++ )
{
    Sum1 *= af * bf;
}
...
int Sum2 = 2;

for ( int i = 0; i != 200000000; i++ )
{
    Sum2 *= a * b;
}

VS 2008, debug build, Win32/x86

(Main):02:00:39.977   Float: 0.87484
(Main):02:00:40.559   Int: 0.58221

VS 2008, debug build, Win64/x86-64

(Main):01:59:27.175   Float: 0.77970
(Main):01:59:27.739   Int: 0.56328

VS 2008, release build, Win32/x86

(Main):02:05:10.413   Float: 0.86724
(Main):02:05:10.631   Int: 0.21741

VS 2008, release build, Win64/x86-64

(Main):02:09:58.355   Float: 0.29311
(Main):02:09:58.571   Int: 0.21595

GCC 4.3.4, armeabi-v7a, -mfpu=vfp -mfloat-abi=softfp -O3

01-27 02:02:20.152 I/LEngine (15809): (Main):02:02:20.156   Float: 6.97402
01-27 02:02:22.765 I/LEngine (15809): (Main):02:02:22.769   Int: 2.61264

The question is: what am i missing (any compiler options)? Is the floating point math really slower (compared to int) on ARM devices?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T05:14:55+00:00

see http://github.com/dwelch67/stm32f4d see the float03 directory

The test compares these two functions fixed vs float

.thumb_func
.globl add
add:
    mov r3,#0
loop:
    add r3,r0,r1
    sub r2,#1
    bne loop
    mov r0,r3
    bx lr

.thumb_func
.globl m4add
m4add:
    vmov s0,r0
    vmov s1,r1
m4loop:
    vadd.f32 s2,s0,s1
    sub r2,#1
    bne m4loop
    vmov r0,s2
    bx lr

The results are not too surprising, the 0x4E2C time is fixed point and 0x4E2E is float, there are a few extra instructions in the float test function that likely account for the difference:

The fpu in the stm32f4 is a limited to single precision version of the vfp found in its big brothers and sisters. You should be able to perform the above test on any armv7 with vfp hardware.

By having the __aeabi_fadd function linked in and that extra call made each time through the loop, plus the additional timing of memory accesses, possibly conversions outside or inside (vmov) the library function, etc, can add to what you are seeing. The answer of course is in the disassembly.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I was wondering how ARM floating point performance on smartphones is compared to x86.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply