# Genetic Programming and Data Structures

W. B. Langdon, University College, London, 1996, 350 pages

## B.7 Glossary

• Building Block A pattern of genes in a contiguous section of a chromosome which, if present, confers a high fitness to the individual. According to the building block hypothesis, a complete solution can be constructed by crossover joining together in a single individual many building blocks which where originally spread throughout the population.

• Cellular Automata A regular array of identical finite state automata whose next state is determined solely by their current state and the state of their neighbours. The most widely seen is the game of Life in which complex patterns emerge from a (supposedly infinite) square lattice of simple two state (living and dead) automata whose next state is determined solely by the current states of its four closes neighbours and itself.

• Classifiers An extension of genetic algorithms in which the population consists of a co-operating set of rules (i.e. a rulebase) which are to learn to solve a problem given a number of test cases. Between each generation the population as a whole is evaluated and a fitness is assigned to each rule using the bucket-brigade algorithm or other credit sharing scheme (e.g. the Pitt scheme). These schemes aims to reward or punish rules which contribute to a test case according to how good the total solution is by adjusting the individual rules fitness.

At the end of the test data a new generation is created using a genetic algorithm as if each rule were independent using its own fitness (measures may be taken are taken to ensure a given rule only appears once in the new population).

• Coevolution Two or more populations are evolved at the same time. Often the separate populations compete against each other.

• Convergence Tendency of members of the population to be the same. May be used to mean either their representation or behaviour are identical. Loosely a genetic algorithm solution has been reached.

• Chromosome Normally, in genetic algorithms the bit string which represents the individual. In genetic programming the individual and its representation are usually the same, both being the program parse tree. In nature many species store their genetic information on more than one chromosome.

• Crossover Creating a new individual's representation from parts of its parents' representations.

• Deme A separately evolving subset of the whole population. The subsets may be evolved on a different computers. Emigration between subset may be used (see Panmixia).

• Elitist An elitist genetic algorithm is one that always retains in the population the best individual found so far. Tournament selection is naturally elitist.

• Epistasis A term from biology used to denote that the fitness of an individual depends upon the interaction of a number of their genes. In genetic algorithms this would be indicated by the fitness containing a non-linear combination of components of the string.

• Evolution Programming A population containing a number of trial solutions each of which is evaluated to yield an error. Typically, at the end of each generation, the best half of the population is retained and a new solution is produced from each survivor. The process is continued with the aim that the population should evolve to contain an acceptable solution.

Evolutionary programming like Evolution Strategy produces new children by mutating at random from a single parent solution. The analogue components (e.g. the connection weights when applied to artificial neural networks) are changed by a gaussian function whose standard deviation is given by a function of the parent's error called its temperature. Digital components (e.g. presence of a hidden node) are created and destroyed at random.

• Evolution Strategy or Evolutionsstrategie A search technique first developed in Berlin. Each point in the search space is represented by a vector of real values. In the original Evolution Strategy, (1+1)-ES, the next point to search is given by adding gaussian random noise to the current search point. The new point is evaluated and if better the search continues from it. If not the search continues from the original point. The level of noise is automatically adjusted as the search proceeds.

Evolutionary Strategies can be thought of as like an analogue version of genetic algorithms. In (1+1)-ES, 1 parent is used to create 1 offspring. In (u+l)-ES and (u,l)-ES m parents are used to create l children (perhaps using crossover).

• Finite State Automaton (FSA) or Finite State Machine (FSM) A machine which can be totally described by a finite set of states, it being in one these at any one time, plus a set of rules which determine when it moves from one state to another.

• Fitness Function A process which evaluates a member of a population and gives it a score or fitness. In most cases the goal is to find an individual with the maximum (or minimum) fitness.

• Function Set The set of operators used in a genetic program, e.g. + - * divide. These act as the branch points in the parse tree, linking other functions or terminals. See also non-terminals.

• Generation When the children of one population replace their parents in that population. Where some part of the original population is retained, as in steady state GAs, generation typically refers to the interval during which the number of new individuals created is equal to the population size.

• Generation Equivalent In a steady state GA, the time taken to create as many new individuals as there are in the population.

• Genetic Algorithm A population containing a number of trial solutions each of which is evaluated (to yield a fitness) and a new generation is created from the better of them. The process is continued through a number of generations with the aim that the population should evolve to contain an acceptable solution.

GAs are characterised by representing the solution as an (often fixed length) string of digital symbols, selecting parents from the current population in proportion to their fitness (or some approximation of this) and the use of crossover as the dominate means of creating new members of the population. The initial population may be created at random or from some known starting point.

• GA Deceptive A gene pattern which confers high fitness but is not present in the optimal solution is said to be deceptive; in that it may lead the genetic algorithm away from the global optimum solution.

• Genetic Operator An operator in a genetic algorithm or genetic programming, which acts upon the chromosome to produce a new individual. Example operators are mutation and crossover.

• Genetic Program A program produced by genetic programming.

• Genetic Programming A subset of genetic algorithms. The members of the populations are the parse trees of computer programs whose fitness is evaluated by running them. The reproduction operators (e.g. crossover) are refined to ensure that the child is syntactically correct (some protection may be given against semantic errors too). This is achieved by acting upon subtrees.

Genetic programming is most easily implemented where the computer language is tree structured so there is no need to explicitly evaluated its parse tree. This is one of the reasons why Lisp is often used for genetic programming.

This is the common usage of the term genetic programming however it has also been used to refer to the programming of cellular automata and neural networks using a genetic algorithm.

• Hits The number of hits an individual scores is the number of test cases for which it returns the correct answer (or close enough to it). This may or may not be a component of the fitness function. When an individual gains the maximum number of hits this may terminate the run.

• Infix Notation Notation in which the operator separates its operands. E.g. (a + b) * c. Infix notation requires the use of brackets to specify the order of evaluation, unlike either prefix or postfix notations.

• Non-Terminal Functions used to link parse tree together. This name may be used to avoid confusion with functions with no parameters which can only act as end points of the parse tree (i.e. leafs) and so are part of the terminal set.

• Mutation Arbitrary change to representation, often at random. In genetic programming, a subtree is replaced by another, some or all of which is created at random.

• Panmixia When a population is split into a number of separately evolving populations (demes) but the level of emigration is sufficiently high that they continue to evolve as if a single population.

• Parsimony Brevity. In GP, this is measured by counting the nodes in the tree. The smaller the program, the smaller the tree, the lower the count and the more parsimonious it is.

• Postfix Notation Reverse Polish Notation or Suffix Notation Notation in which the operator follows its operands. E.g. a + b * c represented as abc*+.

• Prefix Notation Polish Notation Notation in which the operator comes before its operands. E.g. a + b represented as +ab.

• Premature Convergence When a genetic algorithm's population converges to something which is not the solution you wanted.

• Recombination as crossover.

• Reproduction Production of new member of population from existing members. May be used to mean an exact copy of the original member.

• Simulated Annealing Search technique where a single trial solution is modified at random. An energy is defined which represents how good the solution is. The goal is to find the best solution by minimising the energy. Changes which lead to a lower energy are always accepted; an increase is probabilistically accepted. The probability is given by exp(-Delta E/kT). Where Delta E is the change in energy, k is a constant and T is the Temperature. Initially the temperature is high corresponding to a liquid or molten state where large changes are possible and it is progressively reduced using a cooling schedule so allowing smaller changes until the system solidifies at a low energy solution.

• Stochastic Random or probabilistic but with some direction. For example the arrival of people at a post office might be random but average properties (such as the queue length) can be predicted.

• Terminal Set A set from which all end (leaf) nodes in the parse trees representing the programs must be drawn. A terminal might be a variable, a constant or a function with no arguments.

• Tournament Selection A mechanism for choosing individuals from a population. A group (typically between 2 and 7 individuals) are selected at random from the population and the best (normally only one, but possibly more) is chosen.

Glossary of alife terms from "Adaptive Individuals in Evolving Populations: Models and Algorithms".

W.B.Langdon@cs.bham.ac.uk 2 March 1997