This is a follow-up to the question (Link)
What I intend on doing is using the XML to create a graph using NetworkX. Looking at the DOM structure below, all nodes within the same node should have an edge between them, and all nodes that have attended the same conference should have a node to that conference. To summarize, all authors that worked together on a paper should be connected to each other, and all authors who have attended a particular conference should be connected to that conference.
<conference name="CONF 2009">
<paper>
<author>Yih-Chun Hu(UIUC)</author>
<author>David McGrew(Cisco Systems)</author>
<author>Adrian Perrig(CMU)</author>
<author>Brian Weis(Cisco Systems)</author>
<author>Dan Wendlandt(CMU)</author>
</paper>
<paper>
<author>Dan Wendlandt(CMU)</author>
<author>Ioannis Avramopoulos(Princeton)</author>
<author>David G. Andersen(CMU)</author>
<author>Jennifer Rexford(Princeton)</author>
</paper>
</conference>
I’ve figured out how to connect authors to conferences, but I’m unsure about how to connect authors to each other. The thing that I’m having difficulty with is how to iterate over the authors that have worked on the same paper and connect them together.
dom = parse(filepath)
conference=dom.getElementsByTagName('conference')
for node in conference:
conf_name=node.getAttribute('name')
print conf_name
G.add_node(conf_name)
#The nodeValue is split in order to get the name of the author
#and to exclude the university they are part of
plist=node.getElementsByTagName('paper')
for p in plist:
author=str(p.childNodes[0].nodeValue)
author= author.split("(")
#Figure out a way to create edges between authors in the same <paper> </paper>
alist=node.getElementsByTagName('author')
for a in alist:
authortext= str(a.childNodes[0].nodeValue).split("(")
if authortext[0] in dict:
edgeQuantity=dict[authortext[0]]
edgeQuantity+=1
dict[authortext[0]]=edgeQuantity
G.add_edge(authortext[0],conf_name)
#Otherwise, add it to the dictionary and create an edge to the conference.
else:
dict[authortext[0]]= 1
G.add_node(authortext[0])
G.add_edge(authortext[0],conf_name)
i+=1
You need to generate (author, otherauthor) pairs so you can add them as edges. The typical way to do that would be a nested iteration:
This is a naïve implementation that includes self-loops (giving an author an edge connecting himself to himself), which you may or may not want; it also includes both (1,2) and (2,1), which if you’re doing an undirected graph is redundant. (In Python 2.6, the built-in
permutationsgenerator also does this.) Here’s a generator that fixes these things:I’ve not used NetworkX, but looking at the doc it seems to say you can call add_node on the same node twice (with nothing happening the second time). If so, you can discard the dict you were using to try to keep track of what nodes you’d inserted. Also, it seems to say that if you add an edge to an unknown node, it’ll add that node for you automatically. So it should be possible to make the code much shorter: