Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Jason H. Moore and Bill C. White",
  title =        "Genome-Wide Genetic Analysis Using Genetic
                 Programming: The Critical Need for Expert Knowledge",
  booktitle =    "Genetic Programming Theory and Practice {IV}",
  year =         "2006",
  editor =       "Rick L. Riolo and Terence Soule and Bill Worzel",
  volume =       "5",
  series =       "Genetic and Evolutionary Computation",
  pages =        "11--28",
  address =      "Ann Arbor",
  month =        "11-13 " # may,
  publisher =    "Springer",
  keywords =     "genetic algorithms, genetic programming",
  ISBN =         "0-387-33375-4",
  DOI =          "doi:10.1007/978-0-387-49650-4_2",
  size =         "16 pages",
  abstract =     "Human genetics is undergoing an information explosion.
                 The availability of chip-based technology facilitates
                 the measurement of thousands of DNA sequence variation
                 from across the human genome. The challenge is to sift
                 through these high-dimensional datasets to identify
                 combinations of interacting DNA sequence variations
                 that are predictive of common diseases. The goal of
                 this study is to develop and evaluate a genetic
                 programming (GP) approach to attribute selection and
                 classification in this domain. We simulated genetic
                 datasets of varying size in which the disease model
                 consists of two interacting DNA sequence variations
                 that exhibit no independent effects on class (i.e.
                 epistasis). We show that GP is no better than a simple
                 random search when classification accuracy is used as
                 the fitness function. We then show that including
                 pre-processed estimates of attribute quality using
                 Tuned ReliefF (TuRF) in a multi-objective fitness
                 function that also includes accuracy significantly
                 improves the performance of GP over that of random
                 search. This study demonstrates that GP may be a useful
                 computational discovery tool in this domain. This study
                 raises important questions about the general utility of
                 GP for these types of problems, the importance of data
                 pre-processing, the ideal functional form of the
                 fitness function, and the importance of expert
                 knowledge. We anticipate this study will provide an
                 important baseline for future studies investigating the
                 usefulness of GP as a general computational discovery
                 tool for large-scale genetic studies.",
  notes =        "part of \cite{Riolo:2006:GPTP} Published Jan 2007
                 after the workshop",

Genetic Programming entries for Jason H Moore Bill C White