Disease Module Extraction (DiME)

Download

Sample data

Compiled DiME and R scripts (Windows/Unix)

Source code: commExtrNP.cpp

Citation

Liu, Yunpeng, Daniel A. Tennant, Zexuan Zhu, John K. Heath, Xin Yao, and Shan He. "Dime: A Scalable Disease Module Identification Algorithm with Application to Glioma Progression", 9(2), PLOS One, 2014.

Tutorial: Using DiME to Extract Modules From Coexpression Networks

In this tutorial, we guide the user through our DiME workflow using an example of glioblastoma gene expression data. Here we use a jackknifed coexpression network built from pre-processed microarray data and extract modules using the DiME algorithm, evaluate statistical significance of modules with the B-score method (Lancichinetti et al., 2011), build another network to represent inter-module connectivity, which can be easily visualised using software such as Cytoscape. The user only has to execute a few commands in R and the system command line to finish the entire process, and for the interested users a detailed explanation of each step in the workflow will be provided as comments in the R scripts in the downloads.

Here we assume that the potential user has a preprocessed and cleaned dataset (e.g., expression matrix), and provide an example glioblastoma gene expression matrix preprocessed from a set of HGU133Plus2 microarray chips using the gcrma package in R. Rows are genes represented by Entrez IDs (mapped from probesets using the corresponding annotation package), and columns are sample names, representing 126 samples collected from various sources (raw expression data available from the Rembrandt repository).

0. Preparatory Work:

Before starting the tutorial, we assume that the user has all files provided in the downloads in the same folder, and switched the working directory to that folder in R:

setwd(PATH_TO_YOUR_FOLDER)

In order to visualise the modules obtained from DiME, we recommend that the user has Cytoscape installed on the machine.

1. Coexpression Network Construction and Module Extraction:

After having all downloads unzipped into an empty folder and switching the R working directory to that folder, execute:

# For Windows users, the following string should be "commExtrNP.dll")
dynlibfile <- "commExtrNP.so"
source("DiME.R")

in the R console. This will automatically build a binary coexpression network (output in file G4_network.txt) from the glioblastoma expression matrix (in file "G4_expression.csv") and extract modules using the DiME algorithm. The script outputs each module in a separate text file with the Entrez gene IDs of the modules' members. It also creates another representation of the coexpression network where all vertices are named by consecutive integers to facilitate the B-score algorithm afterwards.

2. Evaluating statistical significance of DiME modules:

We have adopted a B-score significance measure from the works of Lancichinetti et al. (2011). The B-score algorithm is provided in C++ source code form and needs to be compiled in order to be executed. In the downloads we provide the user with the open source code, as well as the pre-compiled executables Bscore (Unix) and Bscore.exe (Windows). If the user would like to compile his or her own executables, please use (for gcc compilers):

g++ -lgsl -o Bscore -c compare.cpp

Note that the algorithm source code requires that the GNU Scientific Library to be pre-installed. This is statically linked in the executables we’ve provided, so it should not trouble the user to install any dependencies.

The user is referred to the paper (Lancichinetti et al., 2011) and the original software’s website for more details.

Assuming that the user has placed the B-score executable in the working directory with data files, we can now calculate the B-scores of the modules we have obtained by switching to the working directory in command line and executing:

./Bscore -f G4_edgelist.txt -c G4_commlist.dat -nobcore
where ./Bscore should be changed to Bscore.exe on a Windows machine.

The algorithm will output a file named G4_edgelist.txt.table that records the module sizes and B-scores (averaged from 20 runs for each module).

3. (Optional) Visualise statistically significant modules in Cytoscape:

We can build a network of the interconnectivity of statistically significant modules to see how activity of different modules might be coordinately regulated. We prepare the edgelist, edge weights and node attributes in formatted text using R and import the network into Cytoscape. Here we also include a non-tumour sample expression matrix (NT_expression.csv) for calculating fold-change of gene expression to represent changes in module activity. To build a network of module interconnectivity, execute:

source("buildModuleNetwork.R")

in the R console. The script will output two files, G4_modnet_edgelist.txt (the edge list of the module network) and G4_modnet_nodeAttrib.txt (node attributes including module size and average expression log fold-change), which can be imported into Cytoscape to visualise modules. Self-loops generated in the graph can be removed using the Edit -> Remove Self-loops command in Cytoscape.