^{1}University of Cambridge Computer Laboratory
J.J. Thomson
Avenue, Cambridge, CB3 0FD, England, UK
http://www.cl.cam.ac.uk/~mj201
^{2}School of Computer Science, The University of Birmingham
Birmingham B15 2TT, England, UK
http://www.cs.bham.ac.uk/~mmk
^{3}Fachbereich Informatik, Universität des Saarlandes,
66041 Saarbrücken, Germany
http://www.ags.unisb.de/~pollet
We devised a framework within which a proof planning [pub349] system can learn frequently occurring patterns of reasoning automatically from a number of typical examples, and then use them in proving new theorems [jkpb02tr]. The availability of such patterns, captured as proof methods in a proof planning system, reduces search and proof length. We implemented this learning framework for the proof planner Omega [omega], and present it in this paper  we call our system LearnOmatic. The entire process of learning and using new proof methods in LearnOmatic consists of the following steps:
The methods we aim to learn are complex and are beyond the complexity
that can typically be tackled in the field of machine learning.
Therefore, we simplify the problem and aim to learn method
outlines, which are expressed in the following language L, where
P is a set of known identifiers of primitive methods used in a
method that is being learnt:
"[" and "]" are auxiliary symbols used to separate subexpressions, "," denotes a sequence, "" denotes a disjunction, "*" denotes a repetition of a subexpression any number of times (including 0), n a fixed number of times, and T is a constructor for a branching point (list is a list of branches), i.e., for proofs which are not sequences but branch into a tree. For more information on the choice of this language, the reader is referred to [jkpb02tr].
Here is an example from group theory of a simplify method outline which applies the associativity left method, and then reduces the theorem by applying appropriate inverse and identity methods: [assocl ^{*}, [invr  invl ], idl ].
The algorithm is based on the generalisation of the simultaneous compression of wellchosen examples. Here is just an abstract description of the learning algorithm, but the detailed steps with examples of how they are applied can be found in [jkpb02tr]:
There are some disadvantages to our technique, mostly related to the run time of the algorithm relative to the length of the examples considered for learning. The algorithm can deal with relatively small examples, which we encounter in our application domain, in an optimal way. The complexity of the algorithm is exponential in the worst case. Hence, we use some heuristics for large and badly behaved examples [jkpb02tr].
The applicability test performs a depth first search on the learnt outline. Besides the choice points from the operators of the outline language, i.e., disjunctions and number of repetitions for the Kleene operator, there can be more than one goal where a method of the learnt outline can be applied. Additionally, for methods containing parameters, an instantiation has to be chosen. The parameters of a method are instantiated by control rules that guide the proof search. Every control rule that gives an instantiation of parameters for the current method is evaluated and the resulting possibilities for parameters are added to the search space.
The application test is performed as the precondition of the learnt method. The application of a learnt method for which the test was successful will introduce the open nodes and hypotheses generated during the applicability test as postcondition of the learnt method to the current proof.
In order to evaluate our approach, we carried out an empirical study in different problem domains on a number of theorems. This test set includes the theorems from which new methods were learnt, but most of them are new and more complex. They are from the domains of residue classes (e.g, commutativity of the operation lambda x,y. xy on the residue class set of integers Z_{2}), set theory (e.g., FORALL x,y,z . ((x union y) intersection z) = (x intersection z) union (y intersection z)), and group theory (e.g., group(G, o ,e,i) > FORALL a,b,c,d,f in G. a o (((a^{1} o b) o (c o d)) o f) = (b o (c o d)) o f). The learnt methods were added to the search space in a way that their applicability is checked first, before the existing standard methods are tried.
Table 1 compares the values of matchings and proof length for the three problem domains. It compares these measures when the planner searches for the proof with the standard set of available methods (column marked with S), and when in addition to these, there are also our newly learnt methods available to the planner (column marked with L). "" means that the planner ran out of resources (four hours of CPU time) and could not find a proof plan. The counter matchings counts the successful and unsuccessful application tests of methods in the process of finding a complete successful proof plan. It also contains the method matchings performed by the search engine for learnt methods. Matchings provides an important measure, since on the one hand it indicates how directed was the performed search for a proof. On the other hand, checking the candidate methods that may be applied in the proof is by far the most expensive part of the proof search. Hence, matchings is a good measure to approximate the time needed by the two approaches (i.e., with and without learnt methods) while it is also independent of the concrete implementation inefficiencies.
Domain  Theorems  Matchings  Length  
S  L  S  L
 
Residue  assocz3ztimes  651  113  63  2 
Class  assocz6ztimes  4431  680  441  2 
(using tryanderror method)  average of all  1362.0  219.5  134.0  2.0

Residue  closedz3zplusplus  681  551  49  34 
Class  closedz6zplusplus  3465  2048  235  115 
(using choose method)  average of all  1438.8  918.3  101.0  57.3

Set theory  average of all  33.5  12.5  13.0  2.0

Group theory  average of all (simple)  94.2  79.0  15.5  8.3

Group theory  average of all (complex)    189.6    9.8

We tested the system with (and without) the use of the following learnt methods: for residue classes we used two learnt methods, tryanderror and choose, for set theory we used one learnt method, and for group theory we learnt five new methods, but only used two, since these two are recursive applications of the others. As is evident from Table 1, the number of candidate methods that the planner has to check if they can be applied in the proof (i.e., matchings) is reduced in all domains where our newly learnt methods are available. In general, the more complicated the theorem, the better is the improvement made by the availability of the learnt methods. In the case of group theory, some complex theorems can be proved only within the resource limits when our learnt methods are available to the planner. Hence, the coverage of the system that uses learnt methods is increased. Furthermore, we noticed that for some very simple theorems of group theory, a larger number of matchings is required if the learnt methods are available in the search space. However, for more complex examples, this is no longer the case, and an improvement is noticed. The reason for this behaviour is that additional methods increase the search space, and the application test for learnt methods is expensive, especially when a learnt method is not applicable, but still all possible interpretations of the learnt method outline have to be checked by the search engine.
As expected, the proof length is much reduced by using learnt methods, since they encapsulate patterns in which several other methods are used in the proof.
On average, the time it took to prove theorems of residue classes and conjectures of set theory was up to 50% and 15% shorter, respectively, than without such methods. The search in group theory took approximately 100% longer than without the learnt methods. The time results reflect in principle the behaviour of the proof search measured by method matchings, but also contain the overhead due to the current implementation for the reuse of the learnt methods. For example, the current proof situation is copied for the applicability test of the learnt method, and the new open goals and hypotheses resulting from a successful application are copied back into the original proof.
The reason for the improvements described above is due to the fact that our learnt methods provide a structure according to which the existing methods can be applied, and hence they direct search. This structure also gives better explanation why certain methods are best applied in particular combinations. For example, the simplification method for group theory examples indicates how the methods for associativity, inverse and identity should be combined together, rather than be applied blindly in any possible combination.
There are several limitations of our approach that could be improved in the future. Namely, the learning algorithm may overgeneralise, so we need to examine what are good heuristics for our generalisation and how suboptimal solutions can be improved. In order to reduce unnecessary steps, the preconditions of the learnt methods would ideally be stronger. Currently, we use an applicability test to search if the preconditions of the method outline are satisfied. In the future, preconditions should be learnt as well. Finally, in order to model the human learning capability in theorem proving more adequately it would be necessary to model how humans introduce new vocabulary for new (emerging) concepts.
A demonstration of LearnOmatic implementation can be found on the following web page: http://www.cs.bham.ac.uk/~mmk/demos/LearnOmatic/. Further information, also with links to papers with more comprehensive references can be found on http://www.cs.bham.ac.uk/~mmk/projects/MethodFormation/.
Bundy, A.: The use of explicit plans to guide inductive proofs. In 9th Conference on Automated Deduction. LNCS 310, Springer (1988), 111120.