Scaling Genetic Programming to Large Datasets Using Hierarchical Dynamic Subset Selection

Created by W.Langdon from gp-bibliography.bib Revision:1.4549

  author =       "Robert Curry and Peter Lichodzijewski and 
                 Malcolm I. Heywood",
  title =        "Scaling Genetic Programming to Large Datasets Using
                 Hierarchical Dynamic Subset Selection",
  journal =      "IEEE Transactions on Systems, Man, and Cybernetics:
                 Part B - Cybernetics",
  year =         "2007",
  volume =       "37",
  number =       "4",
  pages =        "1065--1073",
  month =        aug,
  email =        "",
  keywords =     "genetic algorithms, genetic programming, active
                 learning, classification, unbalanced data, hierarchical
                 DSS, RSS, linear genetic programming, casGP",
  ISSN =         "1083-4419",
  URL =          "",
  DOI =          "doi:10.1109/TSMCB.2007.896406",
  size =         "9 pages",
  abstract =     "The computational overhead of Genetic Programming (GP)
                 may be directly addressed without recourse to hardware
                 solutions using active learning algorithms based on the
                 Random or Dynamic Subset Selection heuristics (RSS or
                 DSS). This work begins by presenting a family of
                 hierarchical DSS algorithms: RSS-DSS, cascaded RSS-DSS,
                 and the Balanced Block DSS algorithm; where the latter
                 has not been previously introduced. Extensive
                 benchmarking over four unbalanced real-world binary
                 classification problems with 30,000 to 500,000 training
                 exemplars demonstrates that both the cascade and
                 Balanced Block algorithms are able to reduce the
                 likelihood of degenerates, whilst providing a
                 significant improvement in classification accuracy
                 relative to the original RSS-DSS algorithm. Moreover,
                 comparison with GP trained without an active learning
                 algorithm indicates that classification performance is
                 not compromised, while training is completed in minutes
                 as opposed to half a day.",
  notes =        "max prog length=8, comparsion with lilGP, binary
                 classification, unbalanced training sets, selecting
                 balanced training subsets, page based crossover",

Genetic Programming entries for Robert Curry Peter Lichodzijewski Malcolm Heywood