I’ve a list of proteins in a text file like the format below:
ATF-1 MET4
ATF-1 NFE2L1
ATF-2 ATF-7
ATF-2 B-ATF
ARR1 ARR1
ARR1 CHOP
I want to read from the text file and implement them in undirected graph using adjacency lists either in Java or in Perl. I want to calculate the minimum and maximum number of edges, the shortest and longest path between nodes, and other similar functions.
In perl, you can represent the graph using hash like this:
Run it like this:
For the shortest path you’ll have to decide the start and end node that you want to search the shortest path for.
For this you can use Dijkstra’s algorithm. In a nutshell this is how the algorithm works:
Let’s call the start node A and the end node B.
Assume that we already know the shortest path for going from A to B. If we are at B, then backtracking our steps using the cheapest path should bring us back to point A. Dijkstra’s algorithm starts at A and records the cost of path for going to all of A’s adjacent nodes, and repeats the process for each of the adjacent nodes. Once done, then we can print the shortest path from A to B by backtracking from B to A.
To get the number of nodes:
print keys %graph;To get the number of edges you’ll have to count (uniquely) the number of entries in each of the hash elements, for example to count the number of edges for one node:
print keys %{$graph{'ATF-1'}};