A Comparison of Fitness-Case Sampling Methods for Symbolic Regression with Genetic Programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Yuliana Martinez and Leonardo Trujillo and 
                 Enrique Naredo and Pierrick Legrand",
  title =        "A Comparison of Fitness-Case Sampling Methods for
                 Symbolic Regression with Genetic Programming",
  booktitle =    "EVOLVE - A Bridge between Probability, Set Oriented
                 Numerics, and Evolutionary Computation V",
  year =         "2014",
  editor =       "Alexandru-Adrian Tantar and Emilia Tantar and 
                 Jian-Qiao Sun and Wei Zhang and Qian Ding and 
                 Oliver Schuetze and Michael Emmerich and Pierrick Legrand and 
                 Pierre {Del Moral} and Carlos A. {Coello Coello}",
  volume =       "288",
  series =       "Advances in Intelligent Systems and Computing",
  pages =        "201--212",
  address =      "Peking",
  month =        "1-4 " # jul,
  publisher =    "Springer",
  keywords =     "genetic algorithms, genetic programming, Fitness-Case
                 Sampling, Symbolic Regression, Performance Evaluation",
  isbn13 =       "978-3-319-07493-1",
  DOI =          "doi:10.1007/978-3-319-07494-8_14",
  abstract =     "The canonical approach towards fitness evaluation in
                 Genetic Programming (GP) is to use a static training
                 set to determine fitness, based on a cost function
                 averaged over all fitness-cases. However, motivated by
                 different goals, researchers have recently proposed
                 several techniques that focus selective pressure on a
                 subset of fitness-cases at each generation. These
                 approaches can be described as fitness-case sampling
                 techniques, where the training set is sampled, in some
                 way, to determine fitness. This paper shows a
                 comprehensive evaluation of some of the most recent
                 sampling methods, using benchmark and real-world
                 problems for symbolic regression. The algorithms
                 considered here are Interleaved Sampling, Random
                 Interleaved Sampling, Lexicase Selection and a new
                 sampling technique is proposed called Keep-Worst
                 Interleaved Sampling (KW-IS). The algorithms are
                 extensively evaluated based on test performance, over
                 fitting and bloat. Results suggest that sampling
                 techniques can improve performance compared with
                 standard GP. While on synthetic benchmarks the
                 difference is slight or none at all, on real-world
                 problems the differences are substantial. Some of the
                 best results were achieved by Lexicase Selection and
                 Keep Worse-Interleaved Sampling. Results also show that
                 on real-world problems overfitting correlates strongly
                 with bloating. Furthermore, the sampling techniques
                 provide efficiency, since they reduce the number of
                 fitness-case evaluations required over an entire run.",

Genetic Programming entries for Yuliana Martinez Leonardo Trujillo Enrique Naredo Pierrick Legrand