I have a matrix, matrix_logical(50000,100000), that is a sparse logical matrix (a lot of falses, some true). I have to produce a matrix, intersect(50000,50000), that, for each pair, i,j, of rows of matrix_logical(50000,100000), stores the number of columns for which rows i and j have both “true” as the value.
Here is the code I wrote:
% store in advance the nonzeros cols
for i=1:50000
nonzeros{i} = num2cell(find(matrix_logical(i,:)));
end
intersect = zeros(50000,50000);
for i=1:49999
a = cell2mat(nonzeros{i});
for j=(i+1):50000
b = cell2mat(nonzeros{j});
intersect(i,j) = numel(intersect(a,b));
end
end
Is it possible to further increase the performance? It takes too long to compute the matrix. I would like to avoid the double loop in the second part of the code.
matrix_logical is sparse, but it is not saved as sparse in MATLAB because otherwise the performance become the worst possible.
Since the [i,j] entry counts the number of non zero elements in the element-wise multiplication of rows i and j, you can do it by multiplying
matrix_logicalwith its transpose (you should convert to numeric data type first, e.gmatrix_logical = single(matrix_logical)):And it works both for sparse or full representation.
EDIT
In order to calculate
numel(intersect(a,b))/numel(union(a,b));(as asked in your comment), you can use the fact that for two setsaandb, you haveso, you can do the following: