Inductive data mining: automatic generation of decision trees from data for QSAR modelling and process historical data analysis

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

@Article{Ma:2011:IJMIC,
  author =       "Chao Y. Ma and Frances V. Buontempo and Xue Z. Wang",
  title =        "Inductive data mining: automatic generation of
                 decision trees from data for QSAR modelling and process
                 historical data analysis",
  journal =      "International Journal of Modelling, Identification and
                 Control",
  year =         "2011",
  volume =       "12",
  number =       "1/2",
  pages =        "101--106",
  keywords =     "genetic algorithms, genetic programming, inductive
                 data mining, decision trees, quantitative structure
                 activity relationships, QSAR, process historical data
                 analysis; wastewater treatment, modelling, eco-toxicity
                 prediction",
  ISSN =         "1746-6180",
  language =     "eng",
  URL =          "http://www.inderscience.com/link.php?id=37837",
  DOI =          "doi:10.1504/IJMIC.2011.037837",
  publisher =    "Inderscience Publishers",
  bibsource =    "OAI-PMH server at www.inderscience.com",
  rights =       "Inderscience Copyright",
  abstract =     "A new inductive data mining method for automatic
                 generation of decision trees from data (GPTree) is
                 presented. Compared with other decision tree induction
                 techniques that are based upon recursive partitioning
                 employing greedy searches to choose the best splitting
                 attribute and value at each node therefore will
                 necessarily miss regions of the search space, GPTree
                 can overcome the problem. In addition, the approach is
                 extended to a new method (YAdapt) that models the
                 original continuous endpoint by adaptively finding
                 suitable ranges to describe the endpoints during the
                 tree induction process, removing the need for
                 discretisation prior to tree induction and allowing the
                 ordinal nature of the endpoint to be taken into account
                 in the models built. A strategy for further improving
                 the predictive performance for previously unseen data
                 is investigated that uses multiple decision trees,
                 i.e., a decision forest, and a majority voting strategy
                 to give predictions (GPForest). The methods were
                 applied to QSAR (quantitative structure -- activity
                 relationships) modelling for eco-toxicity prediction of
                 chemicals and to the analysis of a historical database
                 for a wastewater treatment plant.",
}

Genetic Programming entries for Cai-Yun Ma Frances V Buontempo Xue Zhong Wang

Citations