GP-Pi: Using Genetic Programming with Penalization and Initialization on Genome-Wide Association Study

Created by W.Langdon from gp-bibliography.bib Revision:1.4208

  author =       "Ho-Yin Sze-To and Kwan-Yeung Lee and Kai-Yuen Tso and 
                 Man Hon Wong and Kin-Hong Lee and Nelson L. S. Tang and 
                 Kwong-Sak Leung",
  title =        "{GP-Pi}: Using Genetic Programming with Penalization
                 and Initialization on Genome-Wide Association Study",
  bibdate =      "2013-06-07",
  bibsource =    "DBLP,
  booktitle =    "Artificial Intelligence and Soft Computing - 12th
                 International Conference, {ICAISC} 2013, Zakopane,
                 Poland, June 9-13, 2013, Proceedings, Part {II}",
  publisher =    "Springer",
  year =         "2013",
  volume =       "7895",
  editor =       "Leszek Rutkowski and Marcin Korytkowski and 
                 Rafal Scherer and Ryszard Tadeusiewicz and Lotfi A. Zadeh and 
                 Jacek M. Zurada",
  keywords =     "genetic algorithms, genetic programming",
  isbn13 =       "978-3-642-38609-1",
  pages =        "330--341",
  series =       "Lecture Notes in Computer Science",
  URL =          "",
  DOI =          "doi:10.1007/978-3-642-38610-7_31",
  abstract =     "The advancement of chip-based technology has enabled
                 the measurement of millions of DNA sequence variations
                 across the human genome. Experiments revealed that
                 high-order, but not individual, interactions of single
                 nucleotide polymorphisms (SNPs) are responsible for
                 complex diseases such as cancer. The challenge of
                 genome-wide association studies (GWASs) is to sift
                 through high-dimensional datasets to find out
                 particular combinations of SNPs that are predictive of
                 these diseases. Genetic Programming (GP) has been
                 widely applied in GWASs. It serves two purposes:
                 attribute selection and/or discriminative modelling.
                 One advantage of discriminative modelling over
                 attribute selection lies in interpretability. However,
                 existing discriminative modelling algorithms do not
                 scale up well with the increase in the SNP dimension.
                 Here, we have developed GP-Pi. We have introduced a
                 penalising term in the fitness function to penalise
                 trees with common SNPs and an initialiser which uses
                 expert knowledge to seed the population with good
                 attributes. Experimental results on simulated data
                 suggested that GP-Pi outperforms GPAS with
                 statistically significance. GP-Pi was further evaluated
                 on a real GWAS dataset of Rheumatoid Arthritis,
                 obtained from the North American Rheumatoid Arthritis
                 Consortium. Our results, with potential new
                 discoveries, are found to be consistent with

Genetic Programming entries for Ho-Yin Sze-To Kwan-Yeung Lee Kai-Yuen Tso Man Hon Wong Kin-Hong Lee Nelson L S Tang Kwong-Sak Leung