Obtaining Accurate and Comprehensible Data Mining Models - An Evolutionary Approach

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Ulf Johansson",
  title =        "Obtaining Accurate and Comprehensible Data Mining
                 Models - An Evolutionary Approach",
  school =       "Linkoping University, Department of Computer and
                 Information Science",
  year =         "2007",
  type =         "doctoral thesis",
  address =      "SE-581 83, Linkoping, Sweden",
  number =       "1086",
  keywords =     "genetic algorithms, genetic programming, rule
                 extraction, ensembles, data mining, artificial neural
  URL =          "http://hdl.handle.net/2320/2136",
  URL =          "http://bada.hb.se/bitstream/2320/2136/1/AvhandlingFinal.pdf",
  isbn13 =       "978-91-85715-34-3",
  ISSN =         "0345-7524",
  size =         "272 pages",
  abstract =     "When performing predictive data mining, the use of
                 ensembles is claimed to virtually guarantee increased
                 accuracy compared to the use of single models.
                 Unfortunately, the problem of how to maximise ensemble
                 accuracy is far from solved. In particular, the
                 relationship between ensemble diversity and accuracy is
                 not completely understood, making it hard to
                 efficiently use diversity for ensemble creation.
                 Furthermore, most high-accuracy predictive models are
                 opaque, i.e. it is not possible for a human to follow
                 and understand the logic behind a prediction. For some
                 domains, this is unacceptable, since models need to be
                 comprehensible. To obtain comprehensibility, accuracy
                 is often sacrificed by using simpler but transparent
                 models; a trade-off termed the accuracy vs.
                 comprehensibility trade-off. With this trade-off in
                 mind, several researchers have suggested rule
                 extraction algorithms, where opaque models are
                 transformed into comprehensible models, keeping an
                 acceptable accuracy.

                 In this thesis, two novel algorithms based on Genetic
                 Programming are suggested. The first algorithm (GEMS)
                 is used for ensemble creation, and the second (G-REX)
                 is used for rule extraction from opaque models. The
                 main property of GEMS is the ability to combine smaller
                 ensembles and individual models in an almost arbitrary
                 way. Moreover, GEMS can use base models of any kind and
                 the optimisation function is very flexible, easily
                 permitting inclusion of, for instance, diversity
                 measures. In the experimentation, GEMS obtained
                 accuracies higher than both straightforward design
                 choices and published results for Random Forests and
                 AdaBoost. The key quality of G-REX is the inherent
                 ability to explicitly control the accuracy vs.
                 comprehensibility trade-off. Compared to the standard
                 tree inducers C5.0 and CART, and some well-known rule
                 extraction algorithms, rules extracted by G-REX are
                 significantly more accurate and compact. Most
                 importantly, G-REX is thoroughly evaluated and found to
                 meet all relevant evaluation criteria for rule
                 extraction algorithms, thus establishing G-REX as the
                 algorithm to benchmark against.",

Genetic Programming entries for Ulf Johansson