Towards Efficient Training on Large Datasets for Genetic Programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Robert Curry and Malcolm I. Heywood",
  title =        "Towards Efficient Training on Large Datasets for
                 Genetic Programming",
  booktitle =    "17th Conference of the Canadian Society for
                 Computational Studies of Intelligence",
  year =         "2004",
  editor =       "Ahmed Y. Tawfik and Scott D. Goodwin",
  volume =       "3060",
  series =       "LNAI",
  pages =        "161--174",
  address =      "London, Ontario, Canada",
  month =        "17-19 " # may,
  publisher =    "Springer-Verlag",
  email =        "",
  keywords =     "genetic algorithms, genetic programming",
  ISBN =         "3-540-22004-6",
  URL =          "",
  DOI =          "doi:10.1007/b97823",
  abstract =     "Genetic programming (GP) has the potential to provide
                 unique solutions to a wide range of supervised learning
                 problems. The technique, however, does suffer from a
                 widely acknowledged computational overhead. As a
                 consequence applications of GP are often confined to
                 datasets consisting of hundreds of training exemplars
                 as opposed to tens of thousands of exemplars, thus
                 limiting the widespread applicability of the approach.
                 In this work we propose and thoroughly investigate a
                 data sub-sampling algorithm hierarchical dynamic subset
                 selection that filters the initial training dataset in
                 parallel with the learning process. The motivation
                 being to focus the GP training on the most difficult or
                 least recently visited exemplars. To do so, we build on
                 the dynamic sub-set selection algorithm of Gathercole
                 \cite{ga94aGathercole} and extend it into a hierarchy
                 of subset selections, thus matching the concept of a
                 memory hierarchy supported in modern computers. Such an
                 approach provides for the training of GP solutions to
                 data sets with hundreds of thousands of exemplars in
                 tens of minutes whilst matching the classification
                 accuracies of more classical approaches.",

Genetic Programming entries for Robert Curry Malcolm Heywood