Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8527679
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T08:37:27+00:00 2026-06-11T08:37:27+00:00

Currently I want to compare the speed of Python and C when they’re used

  • 0

Currently I want to compare the speed of Python and C when they’re used to do string stuff. I think C should give better performance than Python will; however, I got a total contrary result.

Here’s the C program:

#include <unistd.h>
#include <sys/time.h>

#define L (100*1024)

char s[L+1024];
char c[2*L+1024];

double time_diff( struct timeval et, struct timeval st )
{
    return 1e-6*((et.tv_sec - st.tv_sec)*1000000 + (et.tv_usec - st.tv_usec ));
}

int foo()
{
    strcpy(c,s);
    strcat(c+L,s);
    return 0;
}

int main()
{
    struct timeval st;
    struct timeval et;
    int i;
    //printf("s:%x\nc:%x\n", s,c);

    //printf("s=%d c=%d\n", strlen(s), strlen(c));
    memset(s, '1', L);
    //printf("s=%d c=%d\n", strlen(s), strlen(c));
    foo();
    //printf("s=%d c=%d\n", strlen(s), strlen(c));
    //s[1024*100-1]=0;

    gettimeofday(&st,NULL);
    for( i = 0 ; i < 1000; i++ ) foo();
    gettimeofday(&et,NULL);

    printf("%f\n", time_diff(et,st));
    return 0;
}

and this is the Python one:

import time

s = '1'*102400
def foo():
    c = s + s
    #assert( len(c) == 204800 )

st = time.time()
for x in xrange(1000):
    foo()
et = time.time()

print (et-st)

and what I get:

root@xkqeacwf:~/lab/wfaster# python cp100k.py 
0.027932882309
root@xkqeacwf:~/lab/wfaster# gcc cp100k.c
root@xkqeacwf:~/lab/wfaster# ./a.out 
0.061820

Does that make sense? Or am I just making any stupid mistakes?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T08:37:28+00:00Added an answer on June 11, 2026 at 8:37 am

    Accumulated comments (mainly from me) converted into an answer:

    • What happens if you use your knowledge of the lengths of the strings and use memmove() or memcpy() instead of strcpy() and strcat()? (I note that the strcat() could be replaced with strcpy() with no difference in result — it might be interesting to check the timing.) Also, you didn’t include <string.h> (or <stdio.h>) so you’re missing any optimizations that <string.h> might provide!

    Marcus: Yes, memmove() is faster than strcpy() and faster than Python, but why? Does memmove() do a word-width copy at a time?

    • Yes; on a 64-bit machine for nicely aligned data, it can be moving 64-bits at a time instead of 8-bits at a time; a 32-bit machine, likely 32-bits at a time. It also has only one a simpler test to make on each iteration (count), not (‘count or is it null byte’) ‘is this a null byte’.

    Marcus: But memmove() is still working well even after I make L=L-13, and sizeof(s) gives out L+1024-13. My machine has a sizeof(int)==4.

    • The code for memmove() is highly optimized assembler, possibly inline (no function call overhead, though for 100KiB of data, the function call overhead is minimal). The benefits are from the bigger moves and the simpler loop condition.

    Marcus: So does Python use memmove() as well, or something magic?

    • I’ve not looked at the Python source, but it is practically a certainty that it keeps track of the length of its strings (they’re null terminated, but Python always knows how long the active part of the string is). Knowing that length allows Python to use memmove() or memcpy() (the difference being that memmove() works correctly even if the source and destination overlap; memcpy() is not obliged to work correctly if they overlap). It is relatively unlikely that they’ve got anything faster than memmove/memcpy available.

    I modified the C code to produce more stable timings for me on my machine (Mac OS X 10.7.4, 8 GiB 1333 MHz RAM, 2.3 GHz Intel Core i7, GCC 4.7.1), and to compare strcpy() and strcat() vs memcpy() vs memmove(). Note that I increased the loop count from 1000 to 10000 to improve the stability of the timings, and I repeat the whole test (of all three mechanisms) 10 times. Arguably, the timing loop count should be increased by another factor of 5-10 so that the timings are over a second.

    #include <stdio.h>
    #include <string.h>
    #include <unistd.h>
    #include <sys/time.h>
    
    #define L (100*1024)
    
    char s[L+1024];
    char c[2*L+1024];
    
    static double time_diff( struct timeval et, struct timeval st )
    {
        return 1e-6*((et.tv_sec - st.tv_sec)*1000000 + (et.tv_usec - st.tv_usec ));
    }
    
    static int foo(void)
    {
        strcpy(c,s);
        strcat(c+L,s);
        return 0;
    }
    
    static int bar(void)
    {
        memcpy(c + 0, s, L);
        memcpy(c + L, s, L);
        return 0;
    }
    
    static int baz(void)
    {
        memmove(c + 0, s, L);
        memmove(c + L, s, L);
        return 0;
    }
    
    static void timer(void)
    {
        struct timeval st;
        struct timeval et;
        int i;
    
        memset(s, '1', L);
        foo();
    
        gettimeofday(&st,NULL);
        for( i = 0 ; i < 10000; i++ )
            foo();
        gettimeofday(&et,NULL);
        printf("foo: %f\n", time_diff(et,st));
    
        gettimeofday(&st,NULL);
        for( i = 0 ; i < 10000; i++ )
            bar();
        gettimeofday(&et,NULL);
        printf("bar: %f\n", time_diff(et,st));
    
        gettimeofday(&st,NULL);
        for( i = 0 ; i < 10000; i++ )
            baz();
        gettimeofday(&et,NULL);
        printf("baz: %f\n", time_diff(et,st));
    }
    
    int main(void)
    {
        for (int i = 0; i < 10; i++)
            timer();
        return 0;
    }
    

    That gives no warnings when compiled with:

    gcc -O3 -g -std=c99 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
        -Wold-style-definition cp100k.c -o cp100k
    

    The timing I got was:

    foo: 1.781506
    bar: 0.155201
    baz: 0.144501
    foo: 1.276882
    bar: 0.187883
    baz: 0.191538
    foo: 1.090962
    bar: 0.179188
    baz: 0.183671
    foo: 1.898331
    bar: 0.142374
    baz: 0.140329
    foo: 1.516326
    bar: 0.146018
    baz: 0.144458
    foo: 1.245074
    bar: 0.180004
    baz: 0.181697
    foo: 1.635782
    bar: 0.136308
    baz: 0.139375
    foo: 1.542530
    bar: 0.138344
    baz: 0.136546
    foo: 1.646373
    bar: 0.185739
    baz: 0.194672
    foo: 1.284208
    bar: 0.145161
    baz: 0.205196
    

    What is weird is that if I forego ‘no warnings’ and omit the <string.h> and <stdio.h> headers, as in the original posted code, the timings I got are:

    foo: 1.432378
    bar: 0.123245
    baz: 0.120716
    foo: 1.149614
    bar: 0.186661
    baz: 0.204024
    foo: 1.529690
    bar: 0.104873
    baz: 0.105964
    foo: 1.356727
    bar: 0.150993
    baz: 0.135393
    foo: 0.945457
    bar: 0.173606
    baz: 0.170719
    foo: 1.768005
    bar: 0.136830
    baz: 0.124262
    foo: 1.457069
    bar: 0.130019
    baz: 0.126566
    foo: 1.084092
    bar: 0.173160
    baz: 0.189040
    foo: 1.742892
    bar: 0.120824
    baz: 0.124772
    foo: 1.465636
    bar: 0.136625
    baz: 0.139923
    

    Eyeballing those results, it seems to be faster than the ‘cleaner’ code, though I’ve not run a Student’s t-Test on the two sets of data, and the timings have very substantial variability (but I do have things like Boinc running 8 processes in the background). The effect seemed to be more pronounced in the early versions of the code, when it was just strcpy() and strcat() that was tested. I have no explanation for that, if it is a real effect!

    Followup by mvds

    Since the question was closed I cannot answer properly. On a Mac doing virtually nothing, I get these timings:

    (with headers)

    foo: 1.694667 bar: 0.300041 baz: 0.301693
    foo: 1.696361 bar: 0.305267 baz: 0.298918
    foo: 1.708898 bar: 0.299006 baz: 0.299327
    foo: 1.696909 bar: 0.299919 baz: 0.300499
    foo: 1.696582 bar: 0.300021 baz: 0.299775
    

    (without headers, ignoring warnings)

    foo: 1.185880 bar: 0.300287 baz: 0.300483
    foo: 1.120522 bar: 0.299585 baz: 0.301144
    foo: 1.122017 bar: 0.299476 baz: 0.299724
    foo: 1.124904 bar: 0.301635 baz: 0.300230
    foo: 1.120719 bar: 0.300118 baz: 0.299673
    

    Preprocessor output (-E flag) shows that including the headers translates strcpy into builtin calls like:

    ((__builtin_object_size (c, 0) != (size_t) -1) ? __builtin___strcpy_chk (c, s, __builtin_object_size (c, 2 > 1)) : __inline_strcpy_chk (c, s));
    ((__builtin_object_size (c+(100*1024), 0) != (size_t) -1) ? __builtin___strcat_chk (c+(100*1024), s, __builtin_object_size (c+(100*1024), 2 > 1)) : __inline_strcat_chk (c+(100*1024), s));
    

    So the libc version of strcpy outperforms the gcc builtin. (using gdb it is easily verified that a breakpoint on strcpy indeed doesn’t break on the strcpy() call, if the headers are included)

    On Linux (Debian 5.0.9, amd64), the differences seem to be negligible. The generated assembly (-S flag) only differs in debugging information carried by the includes.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Currently, I have two matrixes, and want to compare it and see whether they
I'm currently using MySQL workbench. I want to see the difference in performance as
I want to compare two collections in C# that I'm currently doing using nested
I'm a beginning programmer in Ruby on Rails 3. I currently want to implement
Currently since I want to access user information in all my templates, I always
I am creating a game currently and I want users to be able to
currently my company has webapplications which want to provide services regarding the users current
I want to get the currently selected item (text, image, etc) and display in
I want to get the Height and Width of an Image.Currently what im doing
I currently have links w/ class=ajax that I want to retrieve the element with

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.