W B Langdon's Papers available via ftp

W B Langdon's Papers available via ftp

W.B.Langdon@cs.bham.ac.uk . 2 Sept 1998

The following papers are available via anonymous ftp from two main directories and sub-directories (README files):

Books

Papers etc.

Informal documents available via ftp

Jointly Edited

Joint papers available via ftp


Why "Building Blocks" Don't Work on Parity Problems

Technical report CSRP-98-17 (html).

ABSTRACT

We investigate the distribution of performance of the parity and always-on Boolean functions given only the appropriate building block. These problems have ``needle in a haystack'' fitness landscape and so are unsuitable for genetic programming or other progressive search techniques. Theoretical analysis shows in the limit as program size grows the density of solutions is independent of size but falls exponentially with number of inputs.

Bibliographic details.


Boolean Functions Fitness Spaces

Late Breaking paper at GP-98 (html) (poster). Also available as technical report CSRP-98-16

ABSTRACT

We investigate the distribution of performance of the Boolean functions of 3 Boolean inputs (particularly that of the parity functions), the always-on-6 and even-6 parity functions. We us enumeration, uniform Monte-Carlo random sampling and sampling random full trees. As expected XOR dramatically changes the fitness distributions. In all cases once some minimum size threshold has been exceeded, the distribution of performance is approximately independent of program length. However the distribution of the performance of full trees is different from that of asymmetric trees and varies with tree depth.

We consider but reject testing the No Free Lunch (NFL) theorems on these functions.

Bibliographic details.


Genetic Programming in Europe

Genetic Programming in Europe

Report of the EvoGP Working Group on Genetic Programming of the European Network of Excellence in Evolutionary Computing


Better Trained Ants for Genetic Programming

Better Trained Ants (html) Co author Riccardo Poli

ABSTRACT

The problem of programming an artificial ant to follow the Santa Fe trail has been repeatedly used as a benchmark problem in GP. Recently we have shown performance of several techniques is not much better than the best performance obtainable using uniform random search. We suggested that this could be because the program fitness landscape is difficult for hill climbers and the problem is also difficult for Genetic Algorithms as it contains multiple levels of deception.

Here we redefine the problem so the ant is (1) obliged to traverse the trail in approximately the correct order, (2) to find food quickly. We also investigate (3) including the ant's speed in the fitness function, either as a linear addition or as a second objective in a multi-objective fitness function, and (4) GP one point crossover.

A simple genetic programming system, with no size or depth restriction, is shown to perform approximately three times better with the improved training function. (Extends CSRP-98-08)

Bibliographic details.


Better Trained Ants

Better Trained Ants (html) Late breaking paper at EuroGP '98.

ABSTRACT

The problem of programming an artificial ant to follow the Santa~Fe trail has been repeatedly used as a benchmark problem. Recently we have shown performance of several techniques is not much better than the best performance obtainable using uniform random search. We suggested that this could be because the program fitness landscape is difficult for hill climbers and the problem is also difficult for Genetic Algorithms as it contains multiple levels of deception.

Here we redefine the problem so the ant is obliged to traverse the trail in approximately the correct order. A simple genetic programming system, with no size or depth restriction, is shown to perform approximately three times better with the improved training function.

Bibliographic details.


Why Ants are Hard

Why Ants are Hard (CSRP-98-04 html) Presented at GP-98.

ABSTRACT

The problem of programming an artificial ant to follow the Santa Fe trail is used as an example program search space. Analysis of shorter solutions shows they have many of the characteristics often ascribed to manually coded programs. Enumeration of a small fraction of the total search space and random sampling characterise it as rugged with many multiple plateaus split by deep valleys and many local and global optima. This suggests it is difficult for hill climbing algorithms. Analysis of the program search space in terms of fixed length schema suggests it is highly deceptive and that for the simplest solutions large building blocks must be assembled before they have above average fitness. In some cases we show solutions cannot be assembled using a fixed representation from small building blocks of above average fitness. These suggest the Ant problem is difficult for Genetic Algorithms.

Random sampling of the program search space suggests on average the density of global optima changes only slowly with program size but the density of neutral networks linking points of the same fitness grows approximately linearly with program length. This is part of the cause of bloat.

Previously reported genetic programming, simulated annealing and hill climbing performance is shown not to be much better than random search on the Ant problem.

Bibliographic details.


Genetic Programming Bloat with Dynamic Fitness

Genetic Programming Bloat with Dynamic Fitness (html) In EuroGP '98.

ABSTRACT

In artificial evolution individuals which perform as their parents are usually rewarded identically to their parents. We note that Nature is more dynamic and there may be a penalty to pay for doing the same thing as your parents. We report two sets of experiments where static fitness functions are firstly augmented by a penalty for unchanged offspring and secondly the static fitness case is replaced by randomly generated dynamic test cases. We conclude genetic programming, when evolving artificial ant control programs, is surprisingly little effected by large penalties and program growth is observed in all our experiments.

Bibliographic details.


Program Growth in Simulated Annealing, Hill Climbing and Populations

The Evolution of Size in Variable Length Representations to appear in ICEC '98.

ABSTRACT

In many cases programs length's increase (known as "bloat", "fluff" and increasing "structural complexity") during artificial evolution. We show bloat is not specific to genetic programming and suggest it is inherent in search techniques with discrete variable length representations using simple static evaluation functions. We investigate the bloating characteristics of three non-population and one population based search techniques using a novel mutation operator.

An artificial ant following the Santa Fe trail problem is solved by simulated annealing, hill climbing, strict hill climbing and population based search using two variants of the the new subtree based mutation operator. As predicted bloat is observed when using unbiased mutation and is absent in simulated annealing and both hill climbers when using the length neutral mutation however bloat occurs with both mutations when using a population.

We conclude that there are two causes of bloat 1) search operators with no length bias tend to sample bigger trees and 2) competition within populations favours longer programs as they can usually reproduce more accurately.

Bibliographic details.


Fitness Causes Bloat: Simulated Annealing, Hill Climbing and Populations

Fitness Causes Bloat: Simulated Annealing, Hill Climbing and Populations. Technical report CSRP-97-22

ABSTRACT

In many cases programs length's increase (known as ``bloat'', ``fluff'' and increasing ``structural complexity'') during artificial evolution. We show bloat is not specific to genetic programming and suggest it is inherent in search techniques with discrete variable length representations using simple static evaluation functions. We investigate the bloating characteristics of three non-population and one population based search techniques using a novel mutation operator.

An artificial ant following the Santa Fe trail problem is solved by simulated annealing, hill climbing, strict hill climbing and population based search using two variants of the the new subtree based mutation operator. As predicted bloat is observed when using unbiased mutation and is absent in simulated annealing and both hill climbers when using the length neutral mutation however bloat occurs with both mutations when using a population.

We conclude that there are two causes of bloat.

Bibliographic details.


Fitness Causes Bloat: Mutation

Fitness Causes Bloat: Mutation

Presented at EuroGP '98. Late Breaking paper at GP-97 (poster). Also available as technical report CSRP-97-16

ABSTRACT

The problem of evolving, using mutation, an artificial ant to follow the Santa Fe trail is used to study the well known genetic programming feature of growth in solution length. Known variously as ``bloat'', ``fluff'' and increasing ``structural complexity'', this is often described in terms of increasing ``redundancy'' in the code caused by ``introns''.

Comparison between runs with and without fitness selection pressure, backed by Price's Theorem, shows the tendency for solutions to grow in size is caused by fitness based selection. We argue that such growth is inherent in using a fixed evaluation function with a discrete but variable length representation. With simple static evaluation search converges to mainly finding trial solutions with the same fitness as existing trial solutions. In general variable length allows many more long representations of a given solution than short ones. Thus in search (without a length bias) we expect longer representations to occur more often and so representation length to tend to increase. I.e. fitness based selection leads to bloat.

Bibliographic details.


Fitness Causes Bloat in Variable Size Representation

Fitness Causes Bloat in Variable Size Representation

Position paper at Workshop on Evolutionary Computation with Variable Size Representation at ICGA-97, 20 July 1997, East Lansing, MI, USA. (Presentation slide)

ABSTRACT

We argue based upon the numbers of representations of given length, that increase in representation length is inherent in using a fixed evaluation function with a discrete but variable length representation. Two examples of this are analysed, including the use of Price's Theorem. Both examples confirm the tendency for solutions to grow in size is caused by fitness based selection.

Bibliographic details.


Fitness Causes Bloat

Fitness Causes Bloat (html)

Co author Riccardo Poli

Second best paper overall award at WSC2 (discussion at WSC2). Updates technical report CSRP-97-09

ABSTRACT

The problem of evolving an artificial ant to follow the Santa Fe trail is used to demonstrate the well known genetic programming feature of growth in solution length. Known variously as ``bloat'', ``redundancy'', ``introns'', ``fluff'', ``Structural Complexity'' with antonyms ``parsimony'', ``Minimum Description Length'' (MDL) and ``Occam's razor''. Comparison with runs with and without fitness selection pressure shows the tendency for solutions to grow in size is caused by fitness based selection. We argue that such growth is inherent in using a fixed evaluation function with a discrete but variable length representation. Since with simple static evaluation search converges to mainly finding trial solutions with the same fitness as existing trial solutions. In general variable length allows many more long representations of a given solution than short ones of the same solution. Thus with an unbiased random search we expect longer representations to occur more often and so representation length tends to increase. I.e. fitness based selection leads to bloat.

Bibliographic details.


An Analysis of the MAX Problem in Genetic Programming

An Analysis of the MAX Problem in Genetic Programming (in GP-97, presentation slides).

Co author Riccardo Poli

ABSTRACT

We present a detailed analysis of the evolution of GP populations using the problem of finding a program which returns the maximum possible value for a given terminal and function set and a depth limit on the program tree (known as the MAX problem). We confirm the basic message of \cite{Gathercole:1996:aicrtd} that crossover together with program size restrictions can be responsible for premature convergence to a sub-optimal solution. We show that this can happen even when the population retains a high level of variety and show that in many cases evolution from the sub-optimal solution to the solution is possible if sufficient time is allowed. In both cases theoretical models are presented and compared with actual runs.

Price's Covariance and Selection Theorem is experimentally tested on GP populations. It is shown to hold only in some cases, in others program size restrictions cause important deviation from its predictions.

Considerable update of CSRP-97-04

Bibliographic details.


Evolution of Genetic Programming Populations

Evolution of Genetic Programming Populations.

ABSTRACT

We investigate in detail what happens as genetic programming (GP) populations evolve. Since we shall use the populations which showed GP can evolve stack data structures as examples, we start in Section 1 by briefly describing the stack experiment In Section 2 we show Price's Covariance and Selection Theorem can be applied to Genetic Algorithms (GAs) and GP to predict changes in gene frequencies. We follow the proof of the theorem with experimental justification using the GP runs from the stack problem. Section 3 briefly describes Fisher's Fundamental Theorem of Natural Selection and shows in its normal interpretation it does not apply to practical GAs.

An analysis of the stack populations, in Section 4, explains that the difficulty of the stack problem is due to the presence of ``deceptive'' high scoring partial solutions in the population. These cause a negative correlation between necessary primitives and fitness. As Price's Theorem predicts, the frequency of necessary primitives falls, eventually leading to their extinction and so to the impossibility of finding solutions like those that are evolved in successful runs.

Section 4 investigates the evolution of variety in GP populations. Detailed measurements of the evolution of variety in stack populations reveal loss of diversity causing crossover to produce offspring which are copies of their parents. Section 5 concludes with measurements that show in the stack population crossover readily produces improvements in performance initially but later no improvements at all are made by crossover.

Section 6 discusses the importance of these results to GP in general.

Bibliographic details.


Using Data Structures within Genetic Programming

Using Data Structures with Genetic Programming (presented at GP-96. Video tape available, V569I-96).

ABSTRACT

In earlier work we showed that GP can automatically generate simple data types (stacks, queues and lists). The results presented herein show, in some cases, provision of appropriately structured memory can indeed be advantageous to GP in comparison with directly addressable indexed memory.

Three ``classic'' problems are solved. The first two require the GP to distinguish between sentences that are in a language and those that are not given positive and negative training examples of the language. The two languages are, correctly nested brackets and a Dyck language (correctly nested brackets of different types). The third problem is to evaluate integer Reverse Polish (postfix) expressions.

Comparisons are made between GP attempting to solve these problems when provided with indexed memory or with stack data structures.

The test cases are available. Bibliographic details.


Data Structures and Genetic Programming

in Advances in Genetic Programming 2

ABSTRACT

In real world applications, software engineers recognise the use of memory must be organised via data structures and that software using the data must be independent of the data structures' implementation details. They achieve this by using abstract data structures, such as records, files and buffers.

We demonstrate that genetic programming can automatically implement simple abstract data structures, considering in detail the task of evolving a list. We show general and reasonably efficient implementations can be automatically generated from simple primitives.

A model for maintaining evolved code is demonstrated using the list problem.

Code is available. Bibliographic details. Original abstract, (May 1995) 4 pages.


Genetic Programming -- Computers using "Natural Selection" to generate programs

RN/95/76

co-author Adil Qureshi

ABSTRACT

Computers that ``program themselves''; science fact or fiction? Genetic Programming uses novel optimisation techniques to ``evolve'' simple programs; mimicking the way humans construct programs by progressively re-writing them. Trial programs are repeatedly modified in the search for ``better/fitter'' solutions. The underlying basis is Genetic Algorithms (GAs).

Genetic Algorithms, pioneered by Holland, Goldberg and others, are evolutionary search techniques inspired by natural selection (i.e\ survival of the fittest). GAs work with a ``population'' of trial solutions to a problem, frequently encoded as strings, and repeatedly select the ``fitter'' solutions, attempting to evolve better ones. The power of GAs is being demonstrated for an increasing range of applications; financial, imaging, VLSI circuit layout, gas pipeline control and production scheduling. But one of the most intriguing uses of GAs - driven by Koza - is automatic program generation.

Genetic Programming applies GAs to a ``population'' of programs - typically encoded as tree-structures. Trial programs are evaluated against a ``fitness function'' and the best solutions selected for modification and re-evaluation. This modification-evaluation cycle is repeated until a ``correct'' program is produced. GP has demonstrated its potential by evolving simple programs for medical signal filters, classifying news stories, performing optical character recognition, and for target identification.

This paper surveys the exciting field of Genetic Programming. As a basis it reviews Genetic Algorithms and automatic program generation. Next it introduces Genetic Programming, describing its history and describing the technique via a worked example in C. Then using a taxonomy that divides GP research into theory/techniques and applications, it surveys recent work from both of these perspectives.

Extensive bibliographies, glossaries and a resource list are included as appendices. },

Code is available. Bibliographic details.


Grow Your Own Programs (hard copy only)

Code is available. Bibliographic details.


Evolving Data Structures with Genetic Programming

Presented at ICGA-95

ABSTRACT

Genetic programming (GP) is a subclass of genetic algorithms (GAs), in which evolving programs are directly represented in the chromosome as trees. Recently it has been shown that programs which explicitly use directly addressable memory can be generated using GP.

It is established good software engineering practice to ensure that programs use memory via abstract data structures such as stacks, queues and lists. These provide an interface between the program and memory, freeing the program of memory management details which are left to the data structures to implement. The main result presented herein is that GP can automatically generate stacks and queues.

Typically abstract data structures support multiple operations, such as put and get. We show that GP can simultaneously evolve all the operations of a data structure by implementing each such operation with its own independent program tree. That is, the chromosome consists of a fixed number of independent program trees. Moreover, crossover only mixes genetic material of program trees that implement the same operation. Program trees interact with each other only via shared memory and shared ``Automatically Defined Functions'' (ADFs).

ADFs, ``pass by reference'' when calling them, Pareto selection, ``good software engineering practice'' and partitioning the genetic population into ``demes'' where also investigated whilst evolving the queue in order to improve the GP solutions.

Code is available. Bibliographic details.


Scheduling Maintenance of Electrical Power Transmission Networks Using Genetic Algorithms

CSRP-97-22 to be published in russian.

ABSTRACT

Summary of Genetic Algorithm and Genetic programming work on evolving schedules for preventative maintenance of the South Wales region of the UK high voltage electricity network.

Bibliographic details.


Scheduling Maintenance of Electrical Power Transmission Networks Using Genetic Programming

RN/96/49 published in GP-96 late breaking papers
and at The 1st Online Workshop on Soft Computing Special Session on Engineering Problem Solving Using Soft Computing.
A modified version appears in Chapter 10 of Artificial Intelligence Techniques In Power Systems.

ABSTRACT

National Grid Plc is responsible for the maintenance of the high voltage electricity transmission network in England and Wales. It must plan maintenance so as to minimize costs taking into account:

Previous work showed the combination of a Genetic Algorithm using an order or permutation chromosome combined with hand coded ``Greedy'' Optimizers can readily produce an optimal schedule for a four node test problem. Following this the same GA has been used to find low cost schedules for the South Wales region of the UK high voltage power network.

This paper describes the evolution of the best known schedule for the base South Wales problem using Genetic Programming starting from the hand coded heuristics used with the GA.

Bibliographic details.


Scheduling Planned Maintenance of the South Wales High-Voltage Regional Electricity Network

Presentation at the Fourth AISB Workshop on EVOLUTIONARY COMPUTING (published by Springer-Verlag as LNCS 1305, submitted version)

ABSTRACT

This paper builds on previous work, applying the combination of a greedy scheduler and permutation GA to the South Wales region of the UK National Grid. In this paper the work is extended to consider continuing to supply electricity is the presence of faults on the network.

Bibliographic details.

Also available as Technical Report CSRP-96-18


Scheduling Planned Maintenance of the National Grid

published by Springer-Verlag as LNCS 993: Evolutionary Computing

Version presented at the AISB Workshop on Evolutionary Computation, 3-4 April 1995

ABSTRACT

National Grid Plc is responsible for the maintenance of the high voltage electricity transmission network in England and Wales. It must plan maintenance so as to minimize costs taking into account:

This complex optimization and scheduling problem is currently performed manually by National Grid's Planning Engineers. Computerized viability checks are performed on the schedules they produce. National Grid's Technology and Science Laboratories and UCL aim to generate low cost schedules using Genetic Algorithms. It is hoped this will aid the Planning Engineers.

This paper reports work in progress. So far:

We are now considering how to scale up. The computational complexity of the best of the greedy optimizers is a concern. Our plans are to consider alternative approaches (such as expansive coding and Genetic Programming) before moving on to a significant section of the whole of the National Grid (South Wales has been suggested as a suitable example of intermediate size).

Code is available Bibliographic details.


Pareto Optimality, Population Partitioning, Price's theorem and Genetic Programming

Accepted for presentation at the AAAI Fall 1995 Symposium on Genetic Programming however it was withdrawn due to pressure of time. This document is as it was submitted, 11 pages.

ABSTRACT

A description of a use of Pareto optimality in genetic programming is given and an analogy with Genetic Algorithm fitness niches is drawn. Techniques to either spread the population across many Pareto optimal fitness values or to reduce the spread are described. It is speculated that a wide spread may not aid Genetic Programming. It is suggested that this might give useful insight into many GPs whose fitness is composed of several sub-objectives.

The successful use of demic populations in GP leads to speculation that smaller evolutionary steps might aid GP in the long run.

An example is given where Price's covariance theorem helped when designing a GP fitness function.

Bibliographic details.


Directed Crossover within Genetic Programming

paper four pages.

ABSTRACT

Describes in detail a mechanism used to bias the choice of crossover locations when evolving a list data structure using genetic programming. The data structure and it evolution will be described in Advances in Genetic Programming 2.

The second section describes current research on biasing the action of reproduction operators within the genetic programming field.

Bibliographic details.


Summary of NK landscapes and GAs

An extended version of the summary that was posted to the Genetic Algorithms Digest on 12 November 1994. I would welcome comments.

ga-nk-summary.txt 300 lines


Summary of Seeding Population GAs

A version of the summary that was posted to the Genetic Algorithms Digest March 1995. I would welcome comments.

ga-seed.txt 150 lines


Quick Intro to simple-gp.c

IN/95/2 14 pages

ABSTRACT

Wouldn't it be great if computers could actually write the programs? This is the promise of genetic programming. This document gives an introduction to the program simple-gp.c which demonstrates the principles of genetic programming in the C language. This program was designed to show the ideas behind GP, ie to be tinkered with rather than to be a definitive implementation.

The appendix contains a glossary of evolutionary computing terms

Code. Bibliographic details.