Extremely Accurate Symbolic Regression for Large Feature Problems

  author =       "Michael F. Korns",
  title =        "Extremely Accurate Symbolic Regression for Large
                 Feature Problems",
  booktitle =    "Genetic Programming Theory and Practice XII",
  year =         "2014",
  editor =       "Rick Riolo and William P. Worzel and Mark Kotanchek",
  series =       "Genetic and Evolutionary Computation",
  pages =        "109--131",
  address =      "Ann Arbor, USA",
  month =        "8-10 " # may,
  publisher =    "Springer",
  keywords =     "genetic algorithms, genetic programming, Abstract
                 expression grammars, Grammar template genetic
                 programming, Particle swarm, Symbolic regression
                 Abstract Expression Grammars Grammar Template Genetic
                 Programming Genetic Programming Grammar Template
                 Genetic Algorithms Particle Swarm Symbolic Regression",
  isbn13 =       "978-3-319-16029-0",
  DOI =          "doi:10.1007/978-3-319-16030-6_7",
  abstract =     "As symbolic regression (SR) has advanced into the
                 early stages of commercial exploitation, the poor
                 accuracy of SR, still plaguing even the most advanced
                 commercial packages, has become an issue for early
                 adopters. Users expect to have the correct formula
                 returned, especially in cases with zero noise and only
                 one basis function with minimally complex grammar

                 At a minimum, users expect the response surface of the
                 SR tool to be easily understood, so that the user can
                 know apriori on what classes of problems to expect
                 excellent, average, or poor accuracy. Poor or unknown
                 accuracy is a hinderence to greater academic and
                 industrial acceptance of SR tools.

                 In a previous paper, we published a complex algorithm
                 for modern symbolic regression which is extremely
                 accurate for a large class of Symbolic Regression
                 problems. The class of problems, on which SR is
                 extremely accurate, was described in detail. This
                 algorithm was extremely accurate, on a single
                 processor, for up to 25 features (columns); and, a
                 cloud configuration was used to extend the extreme
                 accuracy up to as many as 100 features.

                 While the previous algorithm's extreme accuracy for
                 deep problems with a small number of features was an
                 impressive advance, there are many very important
                 academic and industrial SR problems requiring from 100
                 to 1000 features.

                 In this chapter we extend the previous algorithm such
                 that high accuracy is achieved on a wide range of
                 problems, from 25 to 3000 features, using only a single
                 processor. The class of problems, on which the enhanced
                 algorithm is highly accurate, is described in detail. A
                 definition of extreme accuracy is provided, and an
                 informal argument of highly SR accuracy is outlined in
                 this chapter.

                 The new enhanced algorithm is tested on a set of
                 representative problems. The enhanced algorithm is
                 shown to be robust, performing well even in the face of
                 testing data containing up to 3000 features",
  notes =        "http://cscs.umich.edu/gptp-workshops/

                 Part of \cite{Riolo:2014:GPTP} published after the
                 workshop in 2015",

