Theory and Practice for Efficient Genetic Programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Leonardo Vanneschi",
  title =        "Theory and Practice for Efficient Genetic
  school =       "Faculty of Sciences, University of Lausanne",
  year =         "2004",
  address =      "Switzerland",
  email =        "",
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  broken =       "",
  size =         "342 pages",
  abstract =     "Genetic programming is a machine learning technique to
                 automatically create computer programs from high-level
                 specifications of a problem. It achieves this goal by
                 genetically breeding a population of computer programs
                 using the principles of Darwinian natural selection and
                 biologically inspired operations. Selection is obtained
                 by attaching to each program a fitness value, which
                 quantifies how well it solves the problem. In spite of
                 the numerous practical successes of genetic programming
                 and the good quality of theory that has been developed
                 until nowadays, some unsolved problems persist. In
                 synthesis: methodologies to predict, or to measure, the
                 difficulty of problems have not been developed yet
                 (i.e. no technique to measure the capability of genetic
                 programming to find good solutions for a given problem
                 exists) and genetic programming is, in general, a slow
                 and resource consuming process. Proposing solutions to
                 these problems is the guiding thread of this

                 Two measures of problem difficulty are
                 presented:fitness distance correlation (based on the
                 idea that what makes a problem difficult or easy for
                 genetic programming is the relationship between fitness
                 and distance to the goal) and negative slope
                 coefficient (which quantifies some aspects of
                 evolvability, i.e. the ability of genetic operators to
                 produce offsprings that are fitter than their parents).
                 These measures are based on statistical sampling of the
                 search space. Both of them succeed in correctly
                 measuring the difficulty of a wide range of different
                 problems. Advantages and drawbacks of these measures
                 are discussed in depth. Furthermore, a discussion of
                 the concept of fitness landscape, on which these two
                 measures are inspired, is proposed.

                 Among the main causes of inefficiency of genetic
                 programming, one may mention: premature convergence (or
                 the tendency to produce populations in which all the
                 individuals have similar characteristics), bloat (or
                 progressive individuals' code growth) and the fact that
                 fitness evaluation often requires the execution of
                 programs on many different input data (known as fitness
                 cases). This thesis shows how distributing individuals
                 into separate communicating subpopulations naturally
                 counteracts premature convergence and bloat, allowing
                 genetic programming to find solutions of better
                 quality, more quickly. The advantage of parallelising
                 genetic programming is twofold: on the one hand it
                 enables to achieve time savings by distributing the
                 computational effort on a set of calculating agents, on
                 the other hand, the parallel setting offers benefits
                 from the algorithmic point of view, in analogy with the
                 natural parallel evolution of spatially distributed
                 populations. Furthermore, techniques to dynamically
                 tune the size of populations, and to limit the number
                 of fitness cases to be tested, in order to save
                 computational effort, are proposed.",

Genetic Programming entries for Leonardo Vanneschi