I am trying to accomplish the following logical operation in Python but getting into memory and time issues. Since, I am very new to python, guidance on how and where to optimize the problem would be appreciated ! ( I do understand that the following question is somewhat abstract )
import networkx as nx
dic_score = {}
G = nx.watts_strogatz_graph(10000,10,.01) # Generate 2 graphs with 10,000 nodes using Networkx
H = nx.watts_strogatz_graph(10000,10,.01)
for Gnodes in G.nodes()
for Hnodes in H.nodes () # i.e. For all the pair of nodes in both the graphs
score = SomeOperation on (Gnodes,Hnodes) # Calculate a metric
dic_score.setdefault(Gnodes,[]).append([Hnodes, score, -1 ]) # Store the metric in the form a Key: value, where value become a list of lists, pair in a dictionary
Then Sort the lists in the generated dictionary according to the criterion mentioned here
sorting_criterion
My problems/questions are:
1) Is there a better way of approaching this than using the for loops for iteration?
2) What should be the most optimized (fastest) method of approaching the above mentioned problem ? Should I consider using another data structure than a dictionary ? or possibly file operations ?
3) Since I need to sort the lists inside this dictionary, which has 10,000 keys each corresponding to a list of 10,000 values, memory requirements become huge quite quickly and I run out of it.
3) Is there a way to integrate the sorting process within the calculation of dictionary itself i.e. avoid doing a separate loop to sort?
Any inputs would be appreciated ! Thanks !
1) You can use one of functions from
itertoolsmodule for that. Let me just mention it, you can read the manual or call:Here’s an example:
2) If the result is too big to fit in memory, try saving them somewhere. You can output it into a CSV file for example:
This will free your memory.
3) I think it’s better to sort the result data afterwards (because
sortfunction is rather quick) rather than complicate the matters and sort the data on the fly.You could instead use NumPy arroy/matrix operations (sums, products, or even map a function to each matrix row). These are so fast that sometimes filtering the data costs more than calculating everything.
If your app is still very slow, try profiling it to see exactly what operation is slow or is done too many times:
You’ll see the table: