Genes Underlying Inheritance Linked Disorders
The past decade has witnessed dramatic advances in genome sequencing and a substantial shift in the number of genome wide association studies (GWAS). These efforts have expanded considerably our knowledge on the sequential variations in Human DNA and their consequences on the human biology. Complex genetic disorders often involve products of multiple genes acting cooperatively. Nevertheless, pinpointing the decisive elements of such disease pathways is still a challenge. Recently, network biology has proven its use in identifying candidate genes associated with a disease based on the simple observation that proteins translated by phenotypically related genes tend to interact, so called guilt-by-association principle. Here, we present GUILD (Genes Underlying Inheritance Linked Disorders), a network-based prioritization framework to unveil genes associated with a disease phenotype (disease-genes). In GUILD, we exploit several communication mechanisms between disease-genes emerging from the topology of the interaction network. We used three sources of gene-phenotypic association to specify nodes involved in a disorder (including Online Mendelian Inheritance in Man database)previously published data sets. Analyses on multiple human disease phenotypes demonstrated that the methods proposed in GUILD effectively prioritize genes even when the linkage information is not known a priori. We compared the algorithms we have developed with state-of-the-art prioritization methods such as PageRank with priors, Functional-Flow, Random walk with restart and Network propagation. We also tested the robustness of the approaches proving the effect of the network properties and the independence with the number of original genes/proteins associated with the function or phenotype. Finally, we applied GUILD to prioritize genes in the case of Alzheimer Disease.
GUILD (Genes Underlying Inheritance Linked Disorders) is a framework built for the prioritization of disease candidate genes using a priori gene-disease associations and protein interactions. GUILD consists of implementations of 8 algorithms: NetScore, NetZcore, NetShort, NetCombo, fFlow, NetRank, NetWalk and NetProp. NetScore, NetZcore, NetShort, fFlow and NetRank are implemented in C++ while NetWalk and NetProp are implemented in R and NetCombo is a Python script combining the results of NetScore, NetZcore and NetShort. In this manual, we describe how to use these programs included in GUILD framework. See github repository for the latest version of the code. For Python scripts used to create input files see guild_utilities in toolbox package.
Unix-like operating systems typically ship with these programs. If not these programs are freely available online. Note that, Windows users can have the fundamental environment for the installation (GCC and make) through MinGW (http:// http://www.mingw.org ) or Cygwin ( http://www.cygwin.com ). R is a free software environment for statistical computing and graphics and available at http://www.r-project.com /.
Download and unpack the source package guild.tar.gz located at guild.tar.gz e.g. as follows
{bash}
\$> tar xvzf guild.tar.gz
Then, go to the extracted directory
{bash}
\$> cd guild
Next, go to the src folder and issue make command as below. Beware
that make command in MinGW can have a different name (e.g.
mingw32-make.exe)
MacOS users: use make -f Makefile.mac instead of make command
below (available in the github version of the code).
{bash}
\$> cd src
\$> make
An executable named guild should be created under the ``guild'' folder.
Try running it as follows
{bash}
\$> cd ..
\$> ./guild
If you get the following output when you run it, the installation is successfully
completed.
{bash}
./guild [ Copyleft (GPLv3) - 2011 - Emre Guney (Universitat Pompeu Fabra) ]
Arguments:
-s <prioritization_method>{NetScore:s|NetZcore:z|NetShort:d|fFlow:f|NetRank:r}
-n <node_file>
-e <edge_file>
-o <output_file>
-i <number_of_iterations>
-r <number_of_repetitions>
-t <seed_score_threshold>
-x <number_of_sampled_graphs>
-d <sampled_graph_prefix>
-h
Otherwise make sure that you have recent versions of GCC and
make installed, check the steps above and retry compiling.
If you encounter compilation issues due to BOOST code, try compiling using a newer version of BOOST library (changing the BOOST lib path in the Makefile).
For algorithms implemented in C++, a typical GUILD call consist of several mandatory arguments (such as name of the input/output files and type of the prioritization method) followed by prioritization method specific arguments. Mandatory arguments common to all prioritization methods are explained below, method specific arguments are described in the later sections for each method separately. Possible arguments for a GUILD executable call is as follows:
{bash}
\$> ./guild -s <prioritization_method> -n <node_file> -e <edge_file> -o <output_file>
-i <number_of_iterations> -r <number_of_repetitions> -t <seed_score_threshold>
-x <number_of_sampled_graphs> -d <sampled_graph_prefix>
where;
{text}
<node_id> <node_score>
{text}
<node_id> <edge_score> <node_id>
{text}
<node_id> <node_score>
For algorithms implemented in R (NetWalk and NetProp), a typical GUILD call would look like:
{bash}
\$> R --slave --args <node_file> <edge_file> <output_file> <use_propagation> < random_walk.r
where all arguments are as explained before except ``use_propagation'', which -if provided- converts NetWalk algorithm to NetProp.
Method specific parameters for NetScore are;
The following is an example call to run NetScore algorithm using node file node_score.txt and edge file edge_score.txt with number of iteration and repetition parameters of 2 and 3 respectively, writing the calculated scores to a file named output.txt.
{bash}
\$> ./guild -s s -n data/test_proteins.txt -e data/test_interactions.txt -o output.txt -r 3 -i 2
Method specific parameters for NetZcore are;
An example call to run NetZcore algorithm where data folder contains 100 randomly generated networks starting with the prefix test_interactions.txt. (e.g. test_interactions.txt.1, test_interactions.txt.2, ..., test_interactions.txt.100 is as follows:
{bash}
\$> ./guild -s z -n data/test_proteins.txt -e data/test_interactions.txt -o output.txt -i 5
-d data/test_interactions.txt. -x 100
A python script named ``create_random_networks_for_netzcore.py'' is provided
for creating random networks that are going to be used by NetZcore. It requires
Python (version 2.5.2 or higher) and Python NetworkX (version
1.1 or higher) package to be installed in your system. The following command
would create 100 random networks with the same topology of given input network
``data/test_interactions.txt'' with the prefix of
``data/test_interactions.txt.'' (appends a dot at the end of the provided egde
scores file name).
{bash}
\$> python src/create_random_networks_for_netzcore.py data/test_interactions.txt 100
There is no method specific parameter for NetShort, however note that
algorithm uses the phenotypic association scores in the edge scores file
(edge_file) rather than the node scores file (e.g. the average of
the scores of the nodes the edge in concern connects). A python script to
create netshort specific edge scores file is provided for convenience (see
below).
Thus an example NetShort run would be;
{bash}
\$> ./guild -s d -n data/test_proteins.txt -e data/test_interactions_for_netshort.txt -o output.txt
A python script named ``convert_network_for_netshort.py'' is provided
for creating edge scores file that is going to be used by NetShort
(where original edge scores are multiplied by average of the scores of the
nodes the edges belong to). It requires Python (version 2.5.2 or
higher). The following command would convert the original edge scores file
``data/test_interactions.txt'' to a NetShort specific
``data/test_interactions_for_netshort.txt'' egde scores file using node
scores information in ``data/test_proteins.txt''.
{bash}
\$> python src/convert_network_for_netshort.py data/test_proteins.txt data/test_interactions.txt
data/test_interactions_for_netshort.txt
The method specific parameters for fFlow are;
An example call of fFlow where all nodes that have a score higher than
are seeds is;
{bash}
\$> ./guild -s f -n data/test_proteins.txt -e data/test_interactions.txt -o output.txt -i 5 -t 1.0
The method specific parameter for NetRank is;
Therefore an example run for NetRank is as follows:
{bash}
\$> ./guild -s r -n data/test_proteins.txt -e data/test_interactions.txt -o output.txt
Therefore an example run for NetWalk is as follows:
{bash}
\$> R --slave --args data/test_proteins.txt data/test_interactions.txt output.txt < random_walk.r
Therefore an example run for NetProp is as follows:
{bash}
\$> R --slave --args data/test_proteins.txt data/test_interactions.txt output.txt 1 < random_walk.r
Therefore an example run for NetCombo is as follows:
{bash}
\$> python src/combine_scores.py output_netscore.txt output_netzcore.txt output_netshort.txt output.txt