Toward the Automated Analysis of Complex Diseases in Genome-wide Association Studies Using Genetic Programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Andrew Sohn and Randal S. Olson and Jason H. Moore",
  title =        "Toward the Automated Analysis of Complex Diseases in
                 Genome-wide Association Studies Using Genetic
  booktitle =    "Proceedings of the Genetic and Evolutionary
                 Computation Conference",
  series =       "GECCO '17",
  year =         "2017",
  isbn13 =       "978-1-4503-4920-8",
  address =      "Berlin, Germany",
  pages =        "489--496",
  size =         "8 pages",
  URL =          "",
  DOI =          "doi:10.1145/3071178.3071212",
  acmid =        "3071212",
  publisher =    "ACM",
  publisher_address = "New York, NY, USA",
  keywords =     "genetic algorithms, genetic programming, automated
                 machine learning, bioinformatics, genetics, multifactor
                 dimensionality reduction, python",
  month =        "15-19 " # jul,
  abstract =     "Machine learning has been gaining traction in recent
                 years to meet the demand for tools that can efficiently
                 analyse and make sense of the ever-growing databases of
                 biomedical data in health care systems around the
                 world. However, effectively using machine learning
                 methods requires considerable domain expertise, which
                 can be a barrier of entry for bioinformaticians new to
                 computational data science methods. Therefore,
                 off-the-shelf tools that make machine learning more
                 accessible can prove invaluable for bioinformaticians.
                 To this end, we have developed an open source pipeline
                 optimization tool (TPOT-MDR) that uses genetic
                 programming to automatically design machine learning
                 pipelines for bioinformatics studies. In TPOT-MDR, we
                 implement Multifactor Dimensionality Reduction (MDR) as
                 a feature construction method for modelling
                 higher-order feature interactions, and combine it with
                 a new expert knowledge-guided feature selector for
                 large biomedical data sets. We demonstrate TPOT-MDR's
                 capabilities using a combination of simulated and real
                 world data sets from human genetics and find that
                 TPOT-MDR significantly outperforms modern machine
                 learning methods such as logistic regression and
                 eXtreme Gradient Boosting (XGBoost). We further analyse
                 the best pipeline discovered by TPOT-MDR for a real
                 world problem and highlight TPOT-MDR's ability to
                 produce a high-accuracy solution that is also easily
  notes =        "Also known as \cite{Sohn:2017:TAA:3071178.3071212}
                 GECCO-2017 A Recombination of the 26th International
                 Conference on Genetic Algorithms (ICGA-2017) and the
                 22nd Annual Genetic Programming Conference (GP-2017)",

Genetic Programming entries for Andrew Sohn Randal S Olson Jason H Moore