A Baseline Symbolic Regression Algorithm

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

  author =       "Michael F. Korns",
  title =        "A Baseline Symbolic Regression Algorithm",
  booktitle =    "Genetic Programming Theory and Practice X",
  year =         "2012",
  series =       "Genetic and Evolutionary Computation",
  editor =       "Rick Riolo and Ekaterina Vladislavleva and 
                 Marylyn D. Ritchie and Jason H. Moore",
  publisher =    "Springer",
  chapter =      "9",
  pages =        "117--137",
  address =      "Ann Arbor, USA",
  month =        "12-14 " # may,
  keywords =     "genetic algorithms, genetic programming, Abstract
                 expression grammars, Grammar template genetic
                 programming, Particle swarm, Symbolic regression",
  isbn13 =       "978-1-4614-6845-5",
  URL =          "http://dx.doi.org/10.1007/978-1-4614-6846-2_9",
  DOI =          "doi:10.1007/978-1-4614-6846-2_9",
  abstract =     "Recent advances in symbolic regression (SR) have
                 promoted the field into the early stages of commercial
                 exploitation. This is the expected maturation history
                 for an academic field which is progressing rapidly. The
                 original published symbolic regression algorithms in
                 (Koza 1994) have long since been replaced by techniques
                 such as Pareto front, age layered population
                 structures, and even age Pareto front optimisation. The
                 lack of specific techniques for optimising embedded
                 real numbers, in the original algorithms, has been
                 replaced with sophisticated techniques for optimizing
                 embedded constants. Symbolic regression is coming of
                 age as a technology.

                 As the discipline of Symbolic Regression (SR) has
                 matured, the first commercial SR packages have
                 appeared. There is at least one commercial package on
                 the market for several years http://www.rmltech.com/.
                 There is now at least one well documented commercial
                 symbolic regression package available for Mathmatica
                 www.evolved-analytics.com. There is at least one very
                 well done open source symbolic regression package
                 available for free download
                 http://ccsl.mae.cornell.edu/eureqa. Yet, even as the
                 sophistication of commercial SR packages increases,
                 there have been glaring issues with SR accuracy even on
                 simple problems (Korns 2011). The depth and breadth of
                 SR adoption in industry and academia will be greatly
                 affected by the demonstrable accuracy of available SR
                 algorithms and tools.

                 In this chapter we develop a complete public domain
                 algorithm for modern symbolic regression which is
                 reasonably competitive with current commercial SR
                 packages, and calibrate its accuracy on a set of
                 previously published sample problems. This algorithm is
                 designed as a baseline for further public domain
                 research on SR algorithm simplicity and accuracy. No
                 claim is made placing this baseline algorithm on a par
                 with commercial packages, especially as the commercial
                 offerings can be expected to relentlessly improve in
                 the future. However this baseline is a great
                 improvement over the original published algorithms, and
                 is an attempt to consolidate the latest published
                 research into a simplified baseline algorithm of
                 similar speed and accuracy.

                 The baseline algorithm presented herein is called Age
                 Weighted Pareto Optimisation. It is an amalgamation of
                 recent published techniques in Pareto front
                 optimization (Kotanchek et al., 2007), age layered
                 population structures (Hornby 2006), age fitness Pareto
                 optimization (Schmidt and Hipson 2010), and specialised
                 embedded abstract constant optimization (Korns 2010).
                 The complete pseudo code for the baseline algorithm is
                 presented in this paper. It is developed step by step
                 as enhancements to the original published SR algorithm
                 (Koza 1992) with justifications for each enhancement.
                 Before-after speed and accuracy comparisons are made
                 for each enhancement on a series of previously
                 published sample problems.",
  notes =        "part of \cite{Riolo:2012:GPTP} published after the
                 workshop in 2013",

Genetic Programming entries for Michael Korns