I have a 3 dimensional dataset that describes the gene interactions which can be formulated as a graph. The sample of dataset is:
a + b
b + c
c - f
b - d
a + c
f + g
g + h
f + h
‘+’ indicates that a gene on the left side positively regulates the gene on the right. In this data I want to count the sub-graph where a gene (say, x) positively regulates another gene (say, y), y in turn positively regulates yet another gene (say, z). Furthermore, z is also positively regulated by x. There are two such cases in above graph. I want to perform this search preferably using awk but any scripting language is fine. My apologies for being a too specific question and thanks in advance for the help.
Note: See the information regarding Graphviz below.
This should give you a starting point:
Edit: This version handles genes that are described by more than one character.
In the
BEGINclause, setregdelimto a character that doesn’t appear in your data.I’ve omitted the processing code for the minus data.
Output:
Edit 2:
The version below allows you to search for arbitrary combinations. It generalizes the technique used in the original version so no code needs to be duplicated. It also fixes a couple of other
bugslimitations.This can be called like this to do an exhaustive search:
For this data:
The output of the shell loop calling the new version of the script would look like this:
Edit 3:
Graphviz
Another approach would be to use Graphviz. The DOT language can describe the graph and
gvpr, which is an "AWK-like"1 programming language, can analyze and manipulate DOT files.Given the input data in the format as shown in the question, you can use the following AWK program to convert it to DOT:
The command to run would be something like this:
You can then create the graphic above using:
I used the extended data as given above in this answer.
To do an exhaustive search for the type of subgraphs you specified, you can use the following
gvprprogram:To run it, you could use:
The output would be similar to that from the AWK/shell combination above (under "Edit 2"):
1 Loosely speaking.