TEACH GAPEX Tom Khabaza 19th Feb. 1988 This teach file describes a programming exercise on automatic programming by example. The proposed program is modelled loosely on a system called GAP (for Generalising Automatic Programmer), written by Steve Hardy for his Ph.D. The ideas behind this system are described in outline in this file. For more details see the references at the end of the file. CONTENTS - (Use g to access required sections) -- What is GAP? -- How GAP did it -- The GAP micro-world -- Heuristic rules for GAP-like systems -- Exercise -- References -- What is GAP? ------------------------------------------------------- GAP was a program which would generate a simple Lisp function from an example of an input-output pair for that function. So, for example, given the input-output pair: (a b c d) -> ((a) (b) (c) (d)) GAP would generate the Lisp function: (lambda (x) (cond ((null x) nil) (t (append (list (list (car x))) (self (cdr x)))))) Note that this is a slightly unusual dialect of Lisp, in that it allows the "pseudo-function" 'self', to mean a recursive call of the current function. This avoids the need to give the generated function a name. The next section gives a highly simplified version of how GAP achieved this. -- How GAP did it ----------------------------------------------------- GAP regarded all Lisp functions as instances of the following "function schema": (lambda (x) (cond ( nil) (t (append (self ) )))) Expressions like "" and "" are called "slots". Each slot in this schema could be filled according to the following grammar: :== ) :== (list ) ) The undefined category in this grammar is capable in theory of representing any atom. In practice it need be capable only of generating atoms that occur in the example input or output. The schema and grammar together give us a space of possible Lisp function schemas, some of which are fully instantiated, representing real Lisp functions. We can think of this as a search space: each function schema is a node, and the successors of a node S are the schemas formed by instantiating the slots in S by every possible legal value according to the grammar. (The basic schema will thus have 81 immediate successors according to the grammar.) The goal nodes in this search space are those fully instantiated schemas that will produce the given output when applied to the given input. Because the grammar is recursive, some branches of the search space are infinite. A depth-first search of this space would therefore not be appropriate, but a breadth-first search would eventually produce the function we want. However, the search space is large enough that to search it exhaustively might be prohibitively expensive in time, or, with a breadth-first strategy, in space. Certainly to use brute-force search with this technique to solve any real programming problem would be unrealistic. To make this kind of program workable, it must include heuristics to guide the search by examining the properties of the example input-output pair. GAP was a complex program written in Conniver. Its schema(s) and grammar were both more complex than those given above, and its search strategy was based on complex facilities provided by Conniver, but the complexity of the original GAP program is misleading for purposes of the current exercise, because the basic idea was really quite simple. -- The GAP micro-world ------------------------------------------------ It is the provision of heuristics for program generation that constitute the most significant part of this exercise. The technique described above allows us to create a "micro-world" for automatic programming by providing a function schema and grammar, thus forming the relevant search space. Problem solving in this micro-world then consists of heuristic guidance for this search. We are thus working in the domain of "knowledge engineering for automatic programming". What kinds of knowledge can we give the program to guide the search? How should we represent this knowledge? How can we make it specific enough to be useful, yet general enough to allow the generation of a broad range of functions? -- Heuristic rules for GAP-like systems ------------------------------- What kind of heuristics might we use in a GAP-like system? Some example rules are given below. These give a very rough idea of one kind of knowledge that might be used. See the accessible references on GAP for further information. Rule 1: (cdr loop) If the input and the output are lists of the same length, then the function is probably a "cdr loop" - that is: (a) the recursive call probably takes "(cdr x)" as its argument, and (b) the test is probably "(null x)". Rule 2: (forward cdr loop) If the function is a cdr loop, and the atoms in the car of the output are all present in the car of the input, then the function is probably a "forward cdr loop" - that is: (a) the first argument of append is , and (b) the third is nil. Rule 3: (backward cdr loop) If the function is a cdr loop, and the atoms in the car of the output are all present in the last element of the input, then the function is probably a "backward cdr loop" - that is: (a) the first argument of append is nil, and (b) the third is . Rule 4: (forward cdr loop increase nesting) If the function is a forward cdr loop and the nesting of the car of the output is greater than the nesting of the car of the input, then the first argument to append is probably "(list (list ))". Rule 5: (backward cdr loop increase nesting) If the function is a backward cdr loop and the nesting of the last element of the output is greater than the nesting of the last element of the input, then the third argument to append is probably "(list (list ))". These rules might guide us towards functions like "listify" (rules 1, 2 & 4): (lambda (x) (cond ((null x) nil) (t (append (list (list x)) (self (cdr x)) nil) ) ) ) or "reverse" (rules 1, 3 and 5): (lambda (x) (cond ((null x) nil) (t (append nil (self (cdr x)) (list x) ) ) ) ) -- Exercise ----------------------------------------------------------- Write a miniature version of GAP, which performs heuristic search in a space of function schemas, using the initial schema and grammar given above. Your program will probably require the following major components: 1. A simple Lisp interpreter, used for testing that we've found the right function. Clearly it must be capable for applying simple Lisp lambda functions to given arguments, where the functions can be written in terms of "self". You can either write this yourself (it is not very difficult), or use the library "lib mgap_lisp". - see HELP * MGAP_LISP. 2. A heuristic search procedure. This could be roughly like the "ordered search" algorithm, given in Barr & Feigenbaum (1982) chapter II, except that we need not consider differential cost of arcs, since all arcs are the same cost in this search space. This algorithm is based around the idea of a "complete" search of the search space, so that eventually any node will be generated. You may find it more convenient to use a "incomplete" search technique, perhaps based on a "hill-climbing" approach, which may home in faster on promising nodes because not all successors of a node need be examined. 3. A "successors" function to be used by 2. This must take a function schema S, and generate a list of those schema that result from filling in each of S's "slots" using one rewrite by the grammar. These successors will sometimes be fully instantiated, and sometimes only partially. The Lisp interpreter cannot be expected to apply to partially instantiated schemas, but the evaluation function should be able to evaluate anything. 4. An evaluation function for function schemas. This is the location of all the heuristic knowledge, and will probably constitute the major part of the program. You have to decide what kind of knowledge representation to use here. The previous section implicitly suggests that production rules might be used, but you may be able to think of a better way... Do not be over-ambitious with what you expect your program to achieve. While it might be desirable to have quite general knowledge generating a wide variety of functions, this is probably very difficult. A more specialised program that guides the search strongly towards a small class of functions (or even towards some specific function) may well be just as interesting, because the main goal is to illustrate the use of knowledge about recursive functions, as realised in your heuristic evaluation function. -- References --------------------------------------------------------- Sources on GAP: HARDY, S. (1973) "Automatic Induction of LISP Functions", AISB Summer Conference, University of Sussex, July 1974, pp 50-62. [QZ 1240 Ais]. HARDY, S. (1974) "Synthesis of LISP functions from examples", Memo CSM-5, University of Essex Computing Centre, August 1974. Also in IJCAI 4 (1975), pp 240-245. [QZ 1240 Int]. HARDY, S. (1976) "Synthesis of LISP functions from example computations", Ph.D. Thesis, University of Essex. (Difficult to get hold of.) Source for general background on automatic programming: BARR, A. and FEIGENBAUM, E. A. (1982) "The Handbook of Artificial Intelligence", Pitman. [QZ 1240 Han]. Volume 2, Chapter X. Source for ordered search: BARR, A. and FEIGENBAUM, E. A. (1982) "The Handbook of Artificial Intelligence", Pitman. [QZ 1240 Han]. Volume 1, Chapter II.