I have a function which just basically makes lots of calls to a simple defined hash function and tests to see when it finds a duplicate. I need to do lots of simulations with it so would like it to be as fast as possible. I am attempting to use cython to do this. The cython code is currently called with a normal python list of integers with values in the range 0 to m^2.
import math, random
cdef int a,b,c,d,m,pos,value, cyclelimit, nohashcalls
def h3(int a,int b,int c,int d, int m,int x):
return (a*x**2 + b*x+c) %m
def floyd(inputx):
dupefound, nohashcalls = (0,0)
m = len(inputx)
loops = int(m*math.log(m))
for loopno in xrange(loops):
if (dupefound == 1):
break
a = random.randrange(m)
b = random.randrange(m)
c = random.randrange(m)
d = random.randrange(m)
pos = random.randrange(m)
value = inputx[pos]
listofpos = [0] * m
listofpos[pos] = 1
setofvalues = set([value])
cyclelimit = int(math.sqrt(m))
for j in xrange(cyclelimit):
pos = h3(a,b, c,d, m, inputx[pos])
nohashcalls += 1
if (inputx[pos] in setofvalues):
if (listofpos[pos]==1):
dupefound = 0
else:
dupefound = 1
print "Duplicate found at position", pos, " and value", inputx[pos]
break
listofpos[pos] = 1
setofvalues.add(inputx[pos])
return dupefound, nohashcalls
How can I convert inputx and listofpos to use C type arrays and to access the arrays at C speed? Are there any other speed ups I can use? Can setofvalues be sped up?
So that there is something to compare against, 50 calls to floyd() with m = 5000 currently takes around 30 seconds on my computer.
Update: Example code snippet to show how floyd is called.
m = 5000
inputx = random.sample(xrange(m**2), m)
(dupefound, nohashcalls) = edcython.floyd(inputx)
First of all, it seems that you must type the variables inside the function. A good example of it is here.
Second,
cython -a, for “annotate”, gives you a really excellent break down of the code generated by the cython compiler and a color-coded indication of how dirty (read: python api heavy) it is. This output is really essential when trying to optimize anything.Third, the now famous page on working with Numpy explains how to get fast, C-style access to the Numpy array data. Unforunately it’s verbose and annoying. We’re in luck however, because more recent Cython provides Typed Memory Views, which are both easy to use and awesome. Read that entire page before you try to do anything else.
After ten minutes or so I came up with this:
There are no tricks here that aren’t explained on docs.cython.org, which is where I learned them myself, but helps to see it all come together.
The most important changes to your original code are in the comments, but they all amount to giving Cython hints about how to generate code that doesn’t use the Python API.
As an aside: I really don’t know why
infer_typesis not on by default. It lets the compilerimplicitly use C types instead of Python types where possible, meaning less work for you.
If you run
cython -aon this, you’ll see that the only lines that call into Python are your calls to random.sample, and building or adding to a Python set().On my machine, your original code runs in 2.1 seconds. My version runs in 0.6 seconds.
The next step is to get random.sample out of that loop, but I’ll leave that to you.I have edited my answer to demonstrate how to precompute the rand samples. This brings the time down to 0.4 seconds.