Automating Biomedical Data Science Through Tree-Based Pipeline Optimization

Created by W.Langdon from gp-bibliography.bib Revision:1.4549

  author =       "Randal S. Olson and Ryan J. Urbanowicz and 
                 Peter C. Andrews and Nicole A. Lavender and {La Creis} Kidd and 
                 Jason H. Moore",
  title =        "Automating Biomedical Data Science Through Tree-Based
                 Pipeline Optimization",
  booktitle =    "Proceedings of the 19th European Conference on
                 Applications of Evolutionary Computation,
                 EvoApplications 2016, Part I",
  year =         "2016",
  editor =       "Giovanni Squillero and Paolo Burelli",
  volume =       "9597",
  series =       "LNCS",
  pages =        "123--137",
  address =      "Porto, Portugal",
  month =        mar # " 30 — " # apr # " 1",
  publisher =    "Springer",
  note =         "Best paper, EvoBio track",
  keywords =     "genetic algorithms, genetic programming",
  isbn13 =       "978-3-319-31204-0",
  DOI =          "doi:10.1007/978-3-319-31204-0_9",
  abstract =     "Over the past decade, data science and machine
                 learning has grown from a mysterious art form to a
                 staple tool across a variety of fields in academia,
                 business, and government. In this paper, we introduce
                 the concept of tree-based pipeline optimization for
                 automating one of the most tedious parts of machine
                 learning — pipeline design. We implement a Tree-based
                 Pipeline Optimization Tool (TPOT) and demonstrate its
                 effectiveness on a series of simulated and real-world
                 genetic data sets. In particular, we show that TPOT can
                 build machine learning pipelines that achieve
                 competitive classification accuracy and discover novel
                 pipeline operators — such as synthetic feature
                 constructors — that significantly improve
                 classification accuracy on these data sets. We also
                 highlight the current challenges to pipeline
                 optimization, such as the tendency to produce pipelines
                 that overfit the data, and suggest future research
                 paths to overcome these challenges. As such, this work
                 represents an early step toward fully automating
                 machine learning pipeline design.",

Genetic Programming entries for Randal S Olson Ryan J Urbanowicz Peter C Andrews Nicole A Lavender La Creis Renee Kidd Jason H Moore