As a result of trying to answer the question Graph isomorphism for jar files, the debate naturally arose as to how to represent a jar file as a graph using Python.
The problem: given a jar file, read the files contained within it and create a representation of the contents as (a) a data structure and (b) as a graphic, both of them suitable for further study and manipulation, such as, for example, assessing isomorphism with another jar file. In the graph, the tree of directories should be root and branch nodes, ending in files as leaf nodes.
To standardise the answer I use the verletphysics.jar file downloaded from this OpenProcessing sketch.
The Solution
Given that jar files are basically zipped archives, use the
zipfilemodule from the standard library in Python to read the contents and prepare textual and graphic representation of the relations of the contents of the jar.Textual Representation
For the file
verletphysics.jaras mentioned in the question, the code below produces this list of contents:The Key
Each node in the above pathnames is extracted and given a unique id by the code, as below:
The Graph
The pathnames are translated into edges that are built up into this network using NetworkX and plotted with matplotlib.
The Code