I need to perform a block of code like the following:
x = some_number;
y = some_other_number;
u = a_vector_of_numbers;
v = another_vector_of_numbers;
% u and v are of equal size
r1 = ((x == u) | (x == v)); % Expensive!
r2 = ((y == u) | (y == v)); % Expensive!
q = any(r1 & r2);
You can think of this as: x and y are two nodes on graph, and unless I am mistaken, this checks if x and y are connected using an adjacency list [r1, r2]. In other words, I am trying to answer the question: “Is there such an index i that both x and y can be found at r1(i) or r2(i)?”
I need to do this repeatedly. Both r1 and r2 can potentially contain up to thousands of unique values (number of nodes on the graph on the order of 104) and their length is hundreds of thousands (number of edges on the order of 106).
My profiler tells me the two lines I have indicated with comments consume 99% of run-time, and my program takes quite a while to run, so I am wondering: How much more can this be optimized? What is the fundamental limit to the minimum computation time, and how close to it am I?
Also, it would be quite easy to outsource this particular code to another language. Could do that ever result in a significant performance gain?
I haven’t tested this suggestion, too much effort to set up some realistic test data, but …
Have you tried creating an adjacency matrix for your graph and using that for your enquiries ? While creating the matrix (once) would be a relatively expensive operation, the check for the presence of an edge would be much cheaper than reading both adjacency lists (I think).
If you stick with your current algorithm (or, more to the point, with your current data structure) I’d be surprised if you got much speed-up simply by offloading the work to an implementation in another language. Using another language doesn’t change the fact that you are reading through long vectors of data looking for values.