Genetic Programming and Data Structures

W. B. Langdon, University College, London, 1996, 350 pages

Abstract

This thesis investigates the evolution and use of abstract data types within Genetic Programming (GP). In genetic programming the principles of natural evolution (fitness based selection and recombination) acts on program code to automatically generate computer programs. The research in this thesis is motivated by the observation from software engineering that data abstraction (eg via abstract data types) is essential in programs created by human programmers. We investigate whether abstract data types can be similarly beneficial to the automatic production of programs using GP.

GP can automatically ``evolve'' programs which solve non-trivial problems but few experiments have been reported where the evolved programs explicitly manipulate memory and yet memory is an essential component of most computer programs. So far work on evolving programs that explicitly use memory has principally used either problem specific memory models or a simple indexed memory model consisting of a single global shared array. Whilst the latter is potentially sufficient to allow any computation to evolve, it is unstructured and allows complex interaction between parts of programs which weaken their modularity. In software engineering this is addressed by controlled use of memory using scoping rules and abstract data types, such as stacks, queues and files.

This thesis makes five main contributions:

(1) Proving that abstract data types (stacks, queues and lists) can be evolved using genetic programming.

(2) Demonstrating GP can evolve general programs which recognise a Dyck context free language, evaluate Reverse Polish expressions and GP with an appropriate memory structure can solve the nested brackets problem which had previously been solved using a hybrid GP.

(3) In these three cases (Dyck, expression evaluation and nested brackets) an appropriate data structure is proved to be beneficial compared to indexed memory.

(4) Investigations of real world electrical network maintenance scheduling problems demonstrate that Genetic Algorithms can find low cost viable solutions to such problems.

(5) A taxonomy of GP is presented, including a critical review of experiments with evolving memory. These contributions support our thesis that data abstraction can be beneficial to automatic program generation via artificial evolution.

Summary

Abstract, Acknowledgement, Table of Contents, List of Figures, List of Tables 1-20
Chapter 1 Introduction 21-27
Chapter 2 Genetic Programming, Background to this thesis, GP Concepts, 29-55
Chapter 3 Evolving a Stack 57-79
Chapter 4 Evolving a Queue 81-125
Chapter 5 Evolving a List 127-149
Chapter 6 Problems Solved Using Data Structures 151-175
Balanced bracket problem, Dyck Language, Evolving a four function calculator, Survey of GP and Evolvable memory
Chapter 7 Evolution of GP Populations 177-224
Price's Theorem, Fisher's Theorem, Evolution of populations, Loss of Variety, Crossover's Effects
Chapter 8 Conclusions 225-228
Bibliography256 References 229-259
Appendix A Number of Fitness Evaluation Required 261-262
Appendix B Genetic Programming - Computers using "Natural Selection to Generate programs, Survey of GP 263-294
Glossary 295-299
Appendix C Scheduling Planned Maintenance of the National Grid, Java demo 301-321
Appendix D Scheduling Maintenance of the South Wales Network 323-337
Appendix E Implementation 339-343
GP-QUICK, Network Running, Reusing Ancestors Fitness Information, Caches, Compressing the Check Point File, Code
Index 572 entries 345-350

W.B.Langdon@cs.bham.ac.uk 16 February 1997