Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies

Created by W.Langdon from gp-bibliography.bib Revision:1.3973

@Article{Yang:2014:EJHG,
  title =        "Random forest fishing: a novel approach to identifying
                 organic group of risk factors in genome-wide
                 association studies",
  author =       "Wei Yang and C. Charles Gu",
  journal =      "European Journal of Human Genetics",
  year =         "2014",
  volume =       "22",
  pages =        "254--259",
  month =        may # "~22",
  keywords =     "genetic algorithms, genetic programming, genome-wide
                 association, statistical learning, random forest,
                 epistasis, interactions",
  ISSN =         "23695277",
  bibsource =    "OAI-PMH server at www.ncbi.nlm.nih.gov",
  language =     "en",
  oai =          "oai:pubmedcentral.nih.gov:3895629",
  rights =       "Copyright 2014 Macmillan Publishers Limited",
  URL =          "http://www.ncbi.nlm.nih.gov/pmc/articles/PMC",
  URL =          "http://www.ncbi.nlm.nih.gov/pubmed/23695277",
  DOI =          "doi:10.1038/ejhg.2013.109",
  size =         "6 pages",
  abstract =     "Genome-wide association studies (GWAS) has brought
                 methodological challenges in handling massive
                 high-dimensional data and also real opportunities for
                 studying the joint effect of many risk factors acting
                 in concert as an organic group. The random forest (RF)
                 methodology is recognised by many for its potential in
                 examining interaction effects in large data sets.
                 However, RF is not designed to directly handle GWAS
                 data, which typically have hundreds of thousands of
                 single-nucleotide polymorphisms as predictor variables.
                 We propose and evaluate a novel extension of RF, called
                 random forest fishing (RFF), for GWAS analysis. RFF
                 repeatedly updates a relatively small set of predictors
                 obtained by RF tests to find globally important groups
                 predictive of the disease phenotype, using a novel
                 search algorithm based on genetic programming and
                 simulated annealing. A key improvement of RFF results
                 from the use of guidance incorporating empirical test
                 results of genome-wide pairwise interactions. Evaluated
                 using simulated and real GWAS data sets, RFF is shown
                 to be effective in identifying important predictors,
                 particularly when both marginal effects and
                 interactions exist, and is applicable to very large
                 GWAS data sets.",
}

Genetic Programming entries for Wei (Will) Yang C Charles Gu

Citations