Novice programmer here. I’m writing a program that analyzes the relative spatial locations of points (cells). The program gets boundaries and cell type off an array with the x coordinate in column 1, y coordinate in column 2, and cell type in column 3. It then checks each cell for cell type and appropriate distance from the bounds. If it passes, it then calculates its distance from each other cell in the array and if the distance is within a specified analysis range it adds it to an output array at that distance.
My cell marking program is in wxpython so I was hoping to develop this program in python as well and eventually stick it into the GUI. Unfortunately right now python takes ~20 seconds to run the core loop on my machine while MATLAB can do ~15 loops/second. Since I’m planning on doing 1000 loops (with a randomized comparison condition) on ~30 cases times several exploratory analysis types this is not a trivial difference.
I tried running a profiler and array calls are 1/4 of the time, almost all of the rest is unspecified loop time.
Here is the python code for the main loop:
for basecell in range (0, cellnumber-1):
if firstcelltype == np.array((cellrecord[basecell,2])):
xloc=np.array((cellrecord[basecell,0]))
yloc=np.array((cellrecord[basecell,1]))
xedgedist=(xbound-xloc)
yedgedist=(ybound-yloc)
if xloc>excludedist and xedgedist>excludedist and yloc>excludedist and yedgedist>excludedist:
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
xcomploc=np.array((cellrecord[comparecell,0]))
ycomploc=np.array((cellrecord[comparecell,1]))
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
Here is the matlab code for the main loop:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
for comparecell = 1:cellnumber;
if secondcelltype==cellrecord(comparecell,3);
xcomploc=cellrecord(comparecell,1);
ycomploc=cellrecord(comparecell,2);
dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
if (dist>=1) && (dist<=100.4999);
arraytarget=round(dist*analysisdist/intervalnumber);
spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
end
end
end
end
end
end
Thanks!
Here are some ways to speed up your python code.
First: Don’t make np arrays when you are only storing one value. You do this many times over in your code. For instance,
can just be
I’ll show you why with some timeit statements:
That’s an order of magnitude in difference between those calls.
Second: The following code:
can be replaced with
Third: You can get rid of the sqrt as Philip mentioned by squaring
analysisdistbeforehand. However, since you useanalysisdistto getarraytarget, you might want to create a separate variable,analysisdist2that is the square of analysisdist and use that for your comparison.Fourth: You are looking for cells that match
secondcelltypeevery time you get to that point rather than finding those one time and using the list over and over again. You could define an array:and then replace
with
Fifth: Use psyco. It is a JIT compiler. Matlab has a built-in JIT compiler if you’re using a somewhat recent version. This should speed-up your code a bit.
Sixth: If the code still isn’t fast enough after all previous steps, then you should try vectorizing your code. It shouldn’t be too difficult. Basically, the more stuff you can have in numpy arrays the better. Here’s my try at vectorizing:
You can probably take out that inner for loop, but you have to be careful because some of the elements of arraytargets could be the same. Also, I didn’t actually try out all of the code, so there could be a bug or typo in there. Hopefully, it gives you a good idea of how to do this. Oh, one more thing. You make
analysisdist/intervalnumbera separate variable to avoid doing that division over and over again.