Compiled DiME and R scripts (Windows/Unix)
Source code: commExtrNP.cpp
Liu, Yunpeng, Daniel A. Tennant, Zexuan Zhu, John K. Heath, Xin Yao, and Shan He. "Dime: A Scalable Disease Module Identification Algorithm with Application to Glioma Progression", 9(2), PLOS One, 2014.
In this tutorial, we guide the user through our DiME workflow using an example of glioblastoma gene expression data. Here we use a jackknifed coexpression network built from pre-processed microarray data and extract modules using the DiME algorithm, evaluate statistical significance of modules with the B-score method (Lancichinetti et al., 2011), build another network to represent inter-module connectivity, which can be easily visualised using software such as Cytoscape. The user only has to execute a few commands in R and the system command line to finish the entire process, and for the interested users a detailed explanation of each step in the workflow will be provided as comments in the R scripts in the downloads.
Here we assume that the potential user has a preprocessed and cleaned dataset (e.g., expression matrix), and provide an example glioblastoma gene expression matrix preprocessed from a set of HGU133Plus2 microarray chips using the gcrma package in R. Rows are genes represented by Entrez IDs (mapped from probesets using the corresponding annotation package), and columns are sample names, representing 126 samples collected from various sources (raw expression data available from the Rembrandt repository).
0. Preparatory Work:
Before starting the tutorial, we assume that the user has all files provided in the downloads in the same folder, and switched the working directory to that folder in R:
In order to visualise the modules obtained from DiME, we recommend that the user has Cytoscape installed on the machine.
1. Coexpression Network Construction and Module Extraction:
After having all downloads unzipped into an empty folder and switching the R working directory to that folder, execute:
in the R console. This will automatically build a binary coexpression
network (output in file
G4_network.txt) from the
glioblastoma expression matrix (in file "G4_expression.csv") and extract
modules using the DiME algorithm. The script outputs each module in a
separate text file with the Entrez gene IDs of the modules' members. It
also creates another representation of the coexpression network where all
vertices are named by consecutive integers to facilitate the B-score
2. Evaluating statistical significance of DiME modules:
We have adopted a B-score significance measure from the works of
Lancichinetti et al. (2011). The B-score algorithm is provided in C++
source code form and needs to be compiled in order to be executed. In the
downloads we provide the user with the open source code, as well as the
Bscore (Unix) and
(Windows). If the user would like to compile his or her own executables,
please use (for gcc compilers):
g++ -lgsl -o Bscore -c compare.cpp
Note that the algorithm source code requires that the GNU Scientific Library to be pre-installed. This is statically linked in the executables we’ve provided, so it should not trouble the user to install any dependencies.
The user is referred to the paper (Lancichinetti et al., 2011) and the original software’s website for more details.
Assuming that the user has placed the B-score executable in the working directory with data files, we can now calculate the B-scores of the modules we have obtained by switching to the working directory in command line and executing:where
./Bscoreshould be changed to
Bscore.exeon a Windows machine.
The algorithm will output a file named
that records the module sizes and B-scores (averaged from 20 runs for each
3. (Optional) Visualise statistically significant modules in Cytoscape:
We can build a network of the interconnectivity of statistically
significant modules to see how activity of different modules might be
coordinately regulated. We prepare the edgelist, edge weights and node
attributes in formatted text using R and import the network into
Cytoscape. Here we also include a non-tumour sample expression matrix (
for calculating fold-change of gene expression to represent changes in
module activity. To build a network of module interconnectivity, execute:
in the R console. The script will output two files,
(the edge list of the module network) and
(node attributes including module size and average expression log
fold-change), which can be imported into Cytoscape to visualise modules.
Self-loops generated in the graph can be removed using the
Remove Self-loops command in Cytoscape.