When I call function execution time is 6.8 sec.
Call it from a thread time is 3.4 sec
and when using 2 thread 1.8 sec. No matter what optimization I use rations stay same.
In Visual Studio times are like expected 3.1, 3 and 1.7 sec.
#include<math.h>
#include<stdio.h>
#include<windows.h>
#include <time.h>
using namespace std;
#define N 400
float a[N][N];
struct b{
int begin;
int end;
};
DWORD WINAPI thread(LPVOID p)
{
b b_t = *(b*)p;
for(int i=0;i<N;i++)
for(int j=b_t.begin;j<b_t.end;j++)
{
a[i][j] = 0;
for(int k=0;k<i;k++)
a[i][j]+=k*sin(j)-j*cos(k);
}
return (0);
}
int main()
{
clock_t t;
HANDLE hn[2];
b b_t[3];
b_t[0].begin = 0;
b_t[0].end = N;
b_t[1].begin = 0;
b_t[1].end = N/2;
b_t[2].begin = N/2;
b_t[2].end = N;
t = clock();
thread(&b_t[0]);
printf("0 - %d\n",clock()-t);
t = clock();
hn[0] = CreateThread ( NULL, 0, thread, &b_t[0], 0, NULL);
WaitForSingleObject(hn[0], INFINITE );
printf("1 - %d\n",clock()-t);
t = clock();
hn[0] = CreateThread ( NULL, 0, thread, &b_t[1], 0, NULL);
hn[1] = CreateThread ( NULL, 0, thread, &b_t[2], 0, NULL);
WaitForMultipleObjects(2, hn, TRUE, INFINITE );
printf("2 - %d\n",clock()-t);
return 0;
}
Times:
0 - 6868
1 - 3362
2 - 1827
CPU – Core 2 Duo T9300
OS – Windows 8, 64 – bit
compiler: mingw32-g++.exe, gcc version 4.6.2
edit:
Tried different order, same result, even tried separate applications.
Task Manager showing CPU Utilization around 50% for function and 1 thread and 100% for 2 thread
Sum of all elements after each call is the same: 3189909.237955
Cygwin result: 2.5, 2.5 and 2.5 sec
Linux result(pthread): 3.7, 3.7 and 2.1 sec
@borisbn results: 0 – 1446 1 – 1439 2 – 721.
The difference is a result of something in the math library implementing
sin()andcos()– if you replace the calls to those functions with something else that takes time the significant difference between step and 0 and step 1 goes away.Note that I see the difference with
gcc (tdm-1) 4.6.1, which is a 32-bit toolchain targeting 32 bit binaries. Optimization makes no difference (not surprising since it seems to be something in the math library).However, if I build using
gcc (tdm64-1) 4.6.1, which is a 64-bit toolchain, the difference does not appear – regardless if the build is creating a 32-bit program (using the-m32option) or a 64-bit program (-m64).Here are some example test runs (I made minor modifications to the source to make it C99 compatible):
Using the 32-bit TDM MinGW 4.6.1 compiler:
Using the 64-bit TDM 4.6.1 compiler:
A little more information:
The 32-bit TDM distribution (gcc (tdm-1) 4.6.1) links to the
sin()/cos()implementations in themsvcrt.dllsystem DLL via a provided import library:While the 64-bit distribution (gcc (tdm64-1) 4.6.1) doesn’t appear to do that, instead linking to some static library implementation provided with the distribution:
Update/Conclusion:
After a bit of spelunking in a debugger stepping through the assembly of
msvcrt.dll‘s implementation ofcos()I’ve found that the difference in the timing of the main thread versus an explicitly created thread is due to the FPU’s precision being set to a non-default setting (presumably the MinGW runtime in question does this at start up). In the situation where thethread()function takes twice as long, the FPU is set to 64-bit precision (REAL10or in MSVC-speak_PC_64). When the FPU control word is something other than 0x27f (the default state?), themsvcrt.dllruntime will perform the following steps in thesin()andcos()function (and probably other floating point functions):fsin/fcosoperationThe save/restore of the FPU control word is skipped if it’s already set to the expected/desired 0x27f value. Apparently saving/restoring the FPU control word is expensive, since it appears to double the amount of time the function takes.
You can solve the problem by adding the following line to
main()before callingthread():