The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases

Created by W.Langdon from gp-bibliography.bib Revision:1.4448

  title =        "The challenge for genetic epidemiologists: how to
                 analyze large numbers of {SNP}s in relation to complex
  author =       "A Geert Heidema and Jolanda M A Boer and 
                 Nico Nagelkerke and Edwin C M Mariman and 
                 Daphne L {van der A} and Edith J M Feskens",
  year =         "2006",
  month =        apr # "~21",
  journal =      "BMC Genetics",
  volume =       "7",
  number =       "23",
  publisher =    "BioMed Central Ltd.",
  bibsource =    "OAI-PMH server at",
  language =     "en",
  oai =          "",
  rights =       "Copyright 2006 Heidema et al; licensee BioMed Central
  type =         "Commentary",
  keywords =     "genetic algorithms, genetic programming",
  ISSN =         "1471-2156",
  URL =          "",
  URL =          "",
  DOI =          "doi:10.1186/1471-2156-7-23",
  size =         "15 pages",
  abstract =     "Genetic epidemiologists have taken the challenge to
                 identify genetic polymorphisms involved in the
                 development of diseases. Many have collected data on
                 large numbers of genetic markers but are not familiar
                 with available methods to assess their association with
                 complex diseases. Statistical methods have been
                 developed for analysing the relation between large
                 numbers of genetic and environmental predictors to
                 disease or disease-related variables in genetic
                 association studies.

                 In this commentary we discuss logistic regression
                 analysis, neural networks, including the parameter
                 decreasing method (PDM) and genetic programming
                 optimised neural networks (GPNN) and several
                 non-parametric methods, which include the set
                 association approach, combinatorial partitioning method
                 (CPM), restricted partitioning method (RPM),
                 multifactor dimensionality reduction (MDR) method and
                 the random forests approach. The relative strengths and
                 weaknesses of these methods are highlighted.

                 Logistic regression and neural networks can handle only
                 a limited number of predictor variables, depending on
                 the number of observations in the dataset. Therefore,
                 they are less useful than the non-parametric methods to
                 approach association studies with large numbers of
                 predictor variables. GPNN on the other hand may be a
                 useful approach to select and model important
                 predictors, but its performance to select the important
                 effects in the presence of large numbers of predictors
                 needs to be examined. Both the set association approach
                 and random forests approach are able to handle a large
                 number of predictors and are useful in reducing these
                 predictors to a subset of predictors with an important
                 contribution to disease. The combinatorial methods give
                 more insight in combination patterns for sets of
                 genetic and/or environmental predictor variables that
                 may be related to the outcome variable. As the
                 non-parametric methods have different strengths and
                 weaknesses we conclude that to approach genetic
                 association studies using the case-control design, the
                 application of a combination of several methods,
                 including the set association approach, MDR and the
                 random forests approach, will likely be a useful
                 strategy to find the important genes and interaction
                 patterns involved in complex diseases.",
  notes =        "Open Access",

Genetic Programming entries for A Geert Heidema Jolanda M A Boer Nico Nagelkerke Edwin C M Mariman Daphne L van der A Edith J M Feskens