Genetic Programming for the Induction of Decision Trees to Model Ecotoxicity Data

Created by W.Langdon from gp-bibliography.bib Revision:1.4208

  author =       "Frances V. Buontempo and Xue Zhong Wang and 
                 Mulaisho Mwense and Nigel Horan and Anita Young and 
                 Daniel Osborn",
  title =        "Genetic Programming for the Induction of Decision
                 Trees to Model Ecotoxicity Data",
  journal =      "Journal of Chemical Information and Modeling",
  year =         "2005",
  volume =       "45",
  pages =        "904--912",
  note =         "ASAP article. Web Release Date: May 12, 2005",
  keywords =     "genetic algorithms, genetic programming, decision
                 trees, model ecotoxicity, EPTree, C5.0 See5, recursive
                 partitioning, S-Plus, SIMCA-P 8.0, QSAR",
  DOI =          "doi:10.1021/ci049652n",
  size =         "9 pages",
  abstract =     "Automatic induction of decision trees and production
                 rules from data to develop structure-activity models
                 for toxicity prediction has recently received much
                 attention, and the majority of methodologies reported
                 in the literature are based upon recursive partitioning
                 employing greedy searches to choose the best splitting
                 attribute and value at each node. These approaches can
                 be successful; however, the greedy search will
                 necessarily miss regions of the search space. Recent
                 literature has demonstrated the applicability of
                 genetic programming to decision tree induction to
                 overcome this problem. This paper presents a variant of
                 this novel approach, using fewer mutation options and a
                 simpler fitness function, demonstrating its utility in
                 inducing decision trees for ecotoxicity data, via a
                 case study of two data sets giving improved accuracy
                 and generalization ability over a popular decision tree
  notes =        "

                 S1549-9596(04)09652-4 ACS Publications Division

                 cites EPtree \cite{delisle:2004:CIM} y-scrambling. at
                 least 10\% data coverage required of decision trees.
                 Tournament size 16. No parsimony fitness preassure.
                 Trees regrown. Lots of mutation if pop stagnated.
                 Elitist but gives no improvement. -Log(LC50) vibrio
                 fischeri. 1093 features. 60 training compounds. 100
                 generation. Pop 600. 1 second per

                 Department of Chemical Engineering and School of Civil
                 Engineering, University of Leeds, Leeds LS2 9JT, U.K.,
                 AstraZeneca UK Ltd., Brixham Environmental Laboratory,
                 Freshwater Quarry, Brixham, Devon TQ5 8BA, U.K., and
                 Centre of Ecology and Hydrology, Monks Wood, Huntingdon
                 PE28 2LS, U.K.",

Genetic Programming entries for Frances V Buontempo Xue Zhong Wang Mulaisho Mwense Nigel Horan Anita Young Daniel Osborn