Detecting high-order interactions of single nucleotide polymorphisms using genetic programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4192

  author =       "Robin Nunkesser and Thorsten Bernholt and 
                 Holger Schwender and Katja Ickstadt and Ingo Wegener",
  title =        "Detecting high-order interactions of single nucleotide
                 polymorphisms using genetic programming",
  journal =      "Bioinformatics",
  year =         "2007",
  volume =       "23",
  number =       "24",
  pages =        "3280--3288",
  month =        "15 " # dec,
  email =        "",
  keywords =     "genetic algorithms, genetic programming",
  ISSN =         "1460-2059",
  language =     "en",
  oai =          "oai:CiteSeerX.psu:",
  URL =          "",
  URL =          "",
  DOI =          "doi:10.1093/bioinformatics/btm522",
  size =         "9 pages",
  abstract =     "Motivation: Not individual single nucleotide
                 polymorphisms (SNPs), but high-order interactions of
                 SNPs are assumed to be responsible for complex diseases
                 such as cancer. Therefore, one of the major goals of
                 genetic association studies concerned with such
                 genotype data is the identification of these high-order
                 interactions. This search is additionally impeded by
                 the fact that these interactions often are only
                 explanatory for a relatively small subgroup of
                 patients. Most of the feature selection methods
                 proposed in the literature, unfortunately, fail at this
                 task, since they can either only identify individual
                 variables or interactions of a low order, or try to
                 find rules that are explanatory for a high percentage
                 of the observations. In this article, we present a
                 procedure based on genetic programming and multi-valued
                 logic that enables the identification of high-order
                 interactions of categorical variables such as SNPs.
                 This method called GPAS cannot only be used for feature
                 selection, but can also be employed for

                 Results: In an application to the genotype data from
                 the GENICA study, an association study concerned with
                 sporadic breast cancer, GPAS is able to identify
                 high-order interactions of SNPs leading to a
                 considerably increased breast cancer risk for different
                 subsets of patients that are not found by other feature
                 selection methods. As an application to a subset of the
                 HapMap data shows, GPAS is not restricted to
                 association studies comprising several 10 SNPs, but can
                 also be employed to analyse whole-genome data.",
  notes =        "Software can be downloaded from

                 Preliminary Version: Technical Report 24/2007, SFB 475,
                 Universitat Dortmund, Germany.

                 Disjunctive Normal Form (bit set fast implementation).
                 Variable population size (those which are not
                 dominated) Pareto (3 objectives: size TP, TN). No
                 fitness sharing? Non-standard selection. crossover. 5
                 DNF specific mutations.

                 GPAS better than Logic regression, CART, Bagging,
                 Random Forests on GENICA HapMap and (Random) Simulated

Genetic Programming entries for Robin Nunkesser Thorsten Bernholt Holger Schwender Katja Ickstadt Ingo Wegener