Data mining with genetic algorithms on binary trees

Created by W.Langdon from gp-bibliography.bib Revision:1.4216

  author =       "Kenneth Sorensen and Gerrit K. Janssens",
  title =        "Data mining with genetic algorithms on binary trees",
  journal =      "European Journal of Operational Research",
  year =         "2003",
  volume =       "151",
  number =       "2",
  pages =        "253--264",
  note =         "Meta-heuristics in combinatorial optimisation",
  keywords =     "genetic algorithms, genetic programming, Data mining,
                 Binary trees",
  ISSN =         "0377-2217",
  URL =          "",
  DOI =          "doi:10.1016/S0377-2217(02)00824-X",
  size =         "16 pages",
  abstract =     "This paper focuses on the automatic interaction
                 detection (AID)-technique, which belongs to the class
                 of decision tree data mining techniques. The
                 AID-technique explains the variance of a dependent
                 variable through an exhaustive and repeated search of
                 all possible relations between the (binary) predictor
                 variables and the dependent variable. This search
                 results in a tree in which non-terminal nodes represent
                 the binary predictor variables, edges represent the
                 possible values of these predictor variables and
                 terminal nodes or leafs correspond to classes of
                 subjects. Despite of being self-evident, the
                 AID-technique has its weaknesses. To overcome these
                 drawbacks a technique is developed that uses a genetic
                 algorithm to find a set of diverse classification
                 trees, all having a large explanatory power. From this
                 set of trees, the data analyst is able to choose the
                 tree that fulfils his requirements and does not suffer
                 from the weaknesses of the AID-technique. The technique
                 developed in this paper uses some specialised genetic
                 operators that are devised to preserve the structure of
                 the trees and to preserve high fitness from being
                 destroyed. An implementation of the algorithm exists
                 and is freely available. Some experiments were
                 performed which show that the algorithm uses an
                 intensification stage to find high-fitness trees. After
                 that, a diversification stage recombines high-fitness
                 building blocks to find a set of diverse solutions.",

Genetic Programming entries for Kenneth Sorensen Gerrit K Janssens