Benchmarking the Generalization Capabilities of a Compiling Genetic programming System using Sparse Data Sets

Created by W.Langdon from gp-bibliography.bib Revision:1.4216

  author =       "Frank D. Francone and Peter Nordin and 
                 Wolfgang Banzhaf",
  title =        "Benchmarking the Generalization Capabilities of a
                 Compiling Genetic programming System using Sparse Data
  booktitle =    "Genetic Programming 1996: Proceedings of the First
                 Annual Conference",
  editor =       "John R. Koza and David E. Goldberg and 
                 David B. Fogel and Rick L. Riolo",
  year =         "1996",
  month =        "28--31 " # jul,
  keywords =     "genetic algorithms, genetic programming",
  pages =        "72--80",
  address =      "Stanford University, CA, USA",
  publisher =    "MIT Press",
  URL =          "",
  URL =          "",
  URL =          "",
  size =         "9 pages",
  notes =        "GP-96 Notes based upon version submitted to GP-96

                 Wed, 17 Apr 1996 09:20:19 PDT

                 When I read your email (koza's), I went back and
                 checked the output on two other problems that we ran as
                 part of that paper. Gaussian 3D and Phoneme
                 Classification. Each of these was a two output problem
                 and the way the classification was set up, one would
                 expect less than 50% correct classification from a
                 randomly created individual.

                 In those problems, we used 10 different random seeds,
                 3000 individuals per run. The following were the
                 results for the best individual from generation 0's
                 classification rate.

                 Mean Best Worst gauss 0.59 0.64 0.55 iris 0.98 0.99
                 0.97 phoneme 0.73 0.75 0.71

                 Note that these figures represent the results of a
                 random search of 30,000 individuals.

                 As Peter Nordin points out in his email to which this
                 is a reply, on the IRIS problem, even the worst figure
                 is very good. In fact it was statistically
                 indistinguishible from a highly optimized KNN beachmark
                 run on twice as large a training set. This is because
                 the IRIS problem is trivial. As pointed out in the
                 above referenced paper, IRIS should probably not be
                 used as a measure of the learning ability of any ML
                 system, notwithstanding its status as a 'classic'
                 problem. It is probably better characterized as a
                 'classic' way to make a ML system look good.

                 On the other two problems, which were much more
                 difficult, the genetic search improved on the random
                 search considerably. The individuals with the best
                 abilitiy to generalize on the test data set were

                 Best Generalizer Gaussian 3D 72% Phoneme 85%

                 I report these figures here because the generation 0
                 figures are not reported in the above paper


                 Frank Francone


Genetic Programming entries for Frank D Francone Peter Nordin Wolfgang Banzhaf