I’m learning about graphs(they seem super useful) and was wondering if I could get some advice on a possible way to structure my graphs.
Simply, Lets say I get purchase order data everyday and some days its the same as the day before and on others its different. For example, yesterday I had an order of pencils and erasers, I create the two nodes to represent them and then today I get an order for an eraser and a marker, and so on. After each day, my program also looks to see who ordered what, and if Bob ordered a pencil yesterday and then an eraser today, it creates a directed edge. My logic for this is I can see who bought what on each day and I can track the purchase behaviour of Bob(and maybe use it to infer patterns with himself or other users).
My problem is, I’m using networkx(python) and creating a node ‘pencil’ for yesterday and then another node ‘pencil’ for day2 and I can’t differentiate them.
I thought(and have been) naming it day2-pencil and then scanning the entire graph and stripping out the ‘day2-‘ to track pencil orders. This seems wrong to me(not to mention expensive on the processor). I think the key would be if I can somehow mark each day as its own subgraph so when I want to study a specific day or a few days, I don’t have to scan the entire graph.
As my test data gets larger, its getting more and more confusing so I am wondering what the best practice is? Any generate suggestions would be great(as networkx seems pretty full featured so they probably have a way of doing it).
Thanks in advance!
Update: Still no luck, but this maybe helpful:
import networkx as nx
G=nx.Graph()
G.add_node('pencil', day='1/1/12', colour='blue')
G.add_node('eraser', day='1/1/12', colour='rubberish colour. I know thats not a real colour')
G.add_node('pencil', day='1/2/12', colour='blue')
The result I get typing the following command G.node is:
{'pencil': {'colour': 'blue', 'day': '1/2/12'}, 'eraser': {'colour': 'rubberish colour. I know thats not a real colour', 'day': '1/1/12'}}
Its obviously overwriting the pencil from 1/1/12 with 1/2/12 one, not sure if I can make a distint one.
This is mostly depending on your goal actually. What you want to analyze is the definitive factor in your graph design. But, looking at your structure, a general structure would be nodes for
CustomersandProducts, that are connected byDays(I don’t know if this would help you any better but this is in fact a bipartite graph).So your structure would be something like this:
Let’s say, Bob buys a pencil on 1/1/12:
Ok, now Bob goes and buys another pencil on 1/2/12:
so on…
This is actually possible with
networkx. Since you have multiple edges between nodes, you have to choose betweenMultiGraphMorMultiDiGraphdepending on the directed-ness of your edges.so far, not bad. You can actually query things like “Did Alice buy a Pencil on 1/1/12?”.
Things might get bad if you want all orders on specific days. By bad, I don’t mean code-wise, but computation-wise. Code-wise it is rather simple:
But this scans all the edges in the network and filters the ones you want. I don’t think
networkxhas any better way.