in an effort to solve question #3367795 here on SO i have to cope with a number of subproblems. one of these is: in said algorithm (levenshtein distance), several arrays are allocated in memory and initialized with the lines
cdef char *m1 = <char *>calloc( blen + 2, sizeof( char ) )
cdef char *m2 = <char *>calloc( blen + 2, sizeof( char ) )
cdef char *m3 = <char *>malloc( ( blen + 2 ) * sizeof( char ) )
#.........................................................................
for i from 0 <= i <= blen:
m2[ i ] = i
<...snip...>
blen here refers to the length of a Python bytes variable. now as far as i understand the algorithm (see my original post for the full code) and as the code for the initialization of m2 clearly shows, these arrays are meant to hold integer numbers, not characters, so one would think the correct allocations should look like
cdef int *m3 = <int *>malloc( ( blen + 2 ) * sizeof( int ) )
and so on. can anyone with a background in C elucidate to me why char is used? also, maybe more for people inclined to Cython, why is there a cast <char *>? one would think that char *x = malloc( ... ) should suffice to define x.
Quite simply, to save memory — but please note carefully that declaring these arrays as
charlimits the result distance to either 127 or 255, depending on whether the C compiler defaults tosigned charorunsigned charrespectively. In C,charis an integer type — you don’t need anord()to get its integer value.Your original code contains no mention of this limitation. Note that if a
charoverflows, it does so silently and the code will produce incorrect results — 127 + 1 -> -128 (signed); 255 + 1 -> 0 (unsigned).You didn’t respond to my comment on your original question: “””What are the (a) maximum (b) average sizes of your strings? Do you really need to do the whole O(M*N) thing if the two strings are nothing like each other?””” ….. Please answer that now (edit your question); had you done so then, you would have had this question answered then.
Update: Reading the original post again, I’ve noticed a problem: The code that reads
is WRONG on three grounds: (1) it doesn’t shuffle the rows properly (should do
strcpy()before swappingm1andm2) (2)strcpy()will not copy anything beyond the first null (zero byte) (3) there is no need to copy anything, just shuffle the pointers