Currently I want to compare the speed of Python and C when they’re used to do string stuff. I think C should give better performance than Python will; however, I got a total contrary result.
Here’s the C program:
#include <unistd.h>
#include <sys/time.h>
#define L (100*1024)
char s[L+1024];
char c[2*L+1024];
double time_diff( struct timeval et, struct timeval st )
{
return 1e-6*((et.tv_sec - st.tv_sec)*1000000 + (et.tv_usec - st.tv_usec ));
}
int foo()
{
strcpy(c,s);
strcat(c+L,s);
return 0;
}
int main()
{
struct timeval st;
struct timeval et;
int i;
//printf("s:%x\nc:%x\n", s,c);
//printf("s=%d c=%d\n", strlen(s), strlen(c));
memset(s, '1', L);
//printf("s=%d c=%d\n", strlen(s), strlen(c));
foo();
//printf("s=%d c=%d\n", strlen(s), strlen(c));
//s[1024*100-1]=0;
gettimeofday(&st,NULL);
for( i = 0 ; i < 1000; i++ ) foo();
gettimeofday(&et,NULL);
printf("%f\n", time_diff(et,st));
return 0;
}
and this is the Python one:
import time
s = '1'*102400
def foo():
c = s + s
#assert( len(c) == 204800 )
st = time.time()
for x in xrange(1000):
foo()
et = time.time()
print (et-st)
and what I get:
root@xkqeacwf:~/lab/wfaster# python cp100k.py
0.027932882309
root@xkqeacwf:~/lab/wfaster# gcc cp100k.c
root@xkqeacwf:~/lab/wfaster# ./a.out
0.061820
Does that make sense? Or am I just making any stupid mistakes?
Accumulated comments (mainly from me) converted into an answer:
memmove()ormemcpy()instead ofstrcpy()andstrcat()? (I note that thestrcat()could be replaced withstrcpy()with no difference in result — it might be interesting to check the timing.) Also, you didn’t include<string.h>(or<stdio.h>) so you’re missing any optimizations that<string.h>might provide!only onea simpler test to make on each iteration (count), not (‘count or is it null byte’) ‘is this a null byte’.memmove()is highly optimized assembler, possibly inline (no function call overhead, though for 100KiB of data, the function call overhead is minimal). The benefits are from the bigger moves and the simpler loop condition.memmove()ormemcpy()(the difference being thatmemmove()works correctly even if the source and destination overlap;memcpy()is not obliged to work correctly if they overlap). It is relatively unlikely that they’ve got anything faster thanmemmove/memcpyavailable.I modified the C code to produce more stable timings for me on my machine (Mac OS X 10.7.4, 8 GiB 1333 MHz RAM, 2.3 GHz Intel Core i7, GCC 4.7.1), and to compare
strcpy()andstrcat()vsmemcpy()vsmemmove(). Note that I increased the loop count from 1000 to 10000 to improve the stability of the timings, and I repeat the whole test (of all three mechanisms) 10 times. Arguably, the timing loop count should be increased by another factor of 5-10 so that the timings are over a second.That gives no warnings when compiled with:
The timing I got was:
What is weird is that if I forego ‘no warnings’ and omit the
<string.h>and<stdio.h>headers, as in the original posted code, the timings I got are:Eyeballing those results, it seems to be faster than the ‘cleaner’ code, though I’ve not run a Student’s t-Test on the two sets of data, and the timings have very substantial variability (but I do have things like Boinc running 8 processes in the background). The effect seemed to be more pronounced in the early versions of the code, when it was just
strcpy()andstrcat()that was tested. I have no explanation for that, if it is a real effect!Followup by mvds
Since the question was closed I cannot answer properly. On a Mac doing virtually nothing, I get these timings:
(with headers)
(without headers, ignoring warnings)
Preprocessor output (
-Eflag) shows that including the headers translatesstrcpyinto builtin calls like:So the libc version of strcpy outperforms the gcc builtin. (using
gdbit is easily verified that a breakpoint onstrcpyindeed doesn’t break on thestrcpy()call, if the headers are included)On Linux (Debian 5.0.9, amd64), the differences seem to be negligible. The generated assembly (
-Sflag) only differs in debugging information carried by the includes.