I have some board numpy arrays like that:
array([[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 1, 0, 0]])
And I’m using the following code to find the sum of elements on each nth diagonal from -7 to 8 of the board (and the mirrored version of it).
n = 8
rate = [b.diagonal(i).sum()
for b in (board, board[::-1])
for i in range(-n+1, n)]
After some profiling, this operation is taking about 2/3 of overall running time and it seems to be because of 2 factors:
- The
.diagonalmethod builds a new array instead of a view (looks like numpy 1.7 will have a new.diagmethod to solve that) - The iteration is done in python inside the list comprehension
So, there are any methods to find these sums faster (possibly in the C layer of numpy)?
After some more tests, I could reduce 7.5x the total time by caching this operation… Maybe I was looking for the wrong bottleneck?
One more thing:
Just found the .trace method that replaces the diagonal(i).sum() thing and… There wasn’t much improvement in performance (about 2 to 4%).
So the problem should be the comprehension. Any ideas?
There’s a possible solution using
stride_tricks. This is based in part on the plethora of information available in the answers to this question, but the problem is just different enough, I think, not to count as a duplicate. Here’s the basic idea, applied to a square matrix — see below for a function implementing the more general solution.Of course, I have no idea how fast this will actually be. But I bet it will be faster than a Python list comprehension.
For convenience, here’s a fully general
diagonalsfunction. It assumes you want to move the diagonal along the longest axis: