ABSTRACT
In this paper we present a framework for automated learning within mathematical reasoning systems. In particular, this framework enables proof planning systems to automatically learn new proof methods from well chosen examples of proofs which use a similar reasoning pattern to prove related theorems. Our framework consists of a representation formalism for methods and a machine learning technique which can learn methods using this representation formalism. We present the implementation of this framework within the Omega proof planning system, and some experiments we ran on this implementation to evaluate the validity of our approach.
Proof planning [pub349] is an approach to theorem proving which uses proof methods rather than low level logical inference rules to find a proof of a theorem at hand. A proof method specifies a general reasoning pattern that can be used in a proof, and typically represents a number of individual inference rules. For example, mathematical induction can be encoded as a proof method. Proof planners search for a proof plan of a theorem which consists of applications of several methods. An object level logical proof may be generated from a successful proof plan. Proof planning is a powerful technique because it often dramatically reduces the search space, since the search is done on the level of abstract methods rather than on the level of several inference rules that make up a method. Therefore, typically, there are fewer methods than inference rules explored in the search space. Proof planning also allows reuse of proof methods, and moreover generates proofs where the reasoning strategies of proofs are transparent, so they may have an intuitive appeal to a human mathematician. Indeed, the communication of proofs amongst mathematicians can be viewed to be on the level of proof plans.
One of the ways to extend the power of a proof planning system is to enlarge the set of available proof methods. This is particularly beneficial when a class of theorems can be proved in a similar way, hence a new proof method can encapsulate the general reasoning pattern of a proof for such theorems. A proof method in proof planning basically consists of a triple  preconditions, postconditions and a tactic. A tactic is a program which given that the preconditions are satisfied transforms an expression representing a subgoal in a way that the postconditions are satisfied by the transformed subgoal. If no method on an appropriate level is available in a given planning state, then a number of lower level methods (with inference rules as the lowest level methods) have to be applied in order to prove a given theorem. Alternatively, a new method can be added by the developer of a system. However, this is a very knowledge intensive task and hence, presents a difficulty in applying a proof strategy to many domains.
In this work, we show how a system can learn new methods automatically given a typically small number of well chosen examples of related proofs of theorems. This is a significant improvement, since examples (e.g., in the form of classroom example proofs) exist typically in abundance, while the extraction of methods from these examples can be considered as a major bottleneck of the proof planning methodology.
The idea is that the system starts with learning simple proof methods. As the database of available proof methods grows, the system can learn more complex proof methods. Inference rules can be treated as methods by assigning to them pre and postconditions. Thus, from the learning perspective we can have a unified view of inference rules and methods as given sequences of primitives from which the system is learning a pattern. We will refer to all the existing methods available for the construction of proofs as primitive methods. As new methods are learnt from primitive methods, these too become primitive methods from which yet more new methods can be learnt. Clearly, there is a tradeoff between the increased search space due to a larger number of methods, and increasingly better directed search possibilities for subproofs covered by the learnt methods. Namely, on the one hand, if there are more methods, then the search space is potentially larger. On the other hand, the organisation of a planning search space can be arranged so that the newly learnt, more complex methods are searched for first. If a learnt method is found to be applicable, then instead of a number of planning steps (that correspond to the lower level methods encapsulated by the learnt method), a proof planner needs to make one step only. Generally, proof plans consisting of higher level methods will be shorter than their corresponding plans that consist of lower level methods. Hence, the search for a complete proof plan is shallower, but also bushier. In order to measure this tradeoff between the increased search space and better directed search, an empirical study is carried out in the second part of this paper.
Here, we present a hybrid proof planning system Learnomatic, which uses the existing proof planner Omega [omega], and combines it with our own machine learning system. This enhances the Omega system by providing it with the learnt methods that Omega can use in proving new theorems.
Automatic learning by reasoning systems is a difficult and ambitious problem. Our work demonstrates one way of starting to address this problem, and by doing so, it presents several contributions to the field. First, although machine learning techniques have been around for a while, they have been relatively little used in reasoning systems. Making a reasoning system learn proving strategies from examples, much like children learn to solve problems from examples demonstrated to them by the teacher, is hard. Our work makes an important step in a specialised domain towards a proof planning system that can reason and learn.
Second, proof methods have complex structures, and are hence very hard to learn by the existing machine learning techniques. We approach this problem by abstracting as much information from the proof method representation as needed, so that the machine learning techniques can tackle it. Later, after the reasoning pattern is learnt, the abstracted information is restored as much as possible.
Third, unlike in some of the existing related work [fuchs98,schulz00], we are not aiming to improve ways of directing proof search within a fixed set of primitives. In theorem proving systems these primitives are typically inference steps or tactics, and in proof planning systems these primitives are typically proof methods. Rather, we aim to learn the primitives themselves, and hence improve the framework and reduce the search within the proof planning environment. Namely, instead of searching amongst numerous low level proof methods, a proof planner can now search for a newly learnt proof method which encapsulates several of these low level primitive methods.
The aim of our work is to learn and generalise such patterns into proof methods that can then be used for proofs of other related theorems. We now show how this learning is done automatically within our framework.
The methods we aim to learn are complex and beyond the complexity that can typically be tackled in the field of machine learning. Thus, we first simplify the problem and aim to learn socalled method outlines. For this purpose we use a simple representation formalism which abstracts away as much information as possible for the learning process, which is described next. Second, we restore the necessary information as much as possible so that the proof planner can use the newly learnt method using the mechanism described in Section 3.Some information may be irrecoverably lost. In this case, some extra search in the application of the newly learnt methods will typically be necessary.
Let us define the following language L, where P is a set of known identifiers of primitive methods used in a method that is being learnt:
Our learning technique considers some typically small number of positive examples which are represented in terms of sequences of identifiers for primitive methods, and generalises them so that the learnt pattern is in language L. The pattern is of smallest size with respect to a defined heuristic measure of size [jkpb02tr], which essentially counts the number of primitives in an expression. The intuition for it is that a good generalisation is one that reduces the sequences of method identifiers to the smallest number of primitives (e.g., [a^{2}] is better than [a,a]). The pattern is also most specific (or equivalently, least general) with respect to the definition of specificity spec. spec is measured in terms of the number of nestings for each part of the generalisation [jkpb02tr]. Again, this is a heuristic measure. The intuition for this measure is that we give nested generalisations a priority since they are more specific and hence less likely to overgeneralise.
We take both, the size (first) and the specificity (second), in account when selecting the appropriate generalisation. If the generalisations considered have the same rating according to the two measures, then we return all of them. For example, consider two possible generalisations constructed by the learning algorithm described below: [[a^{2}]^{*}] and [a^{*}]. According to size, size([[a^{2}]^{*}])= 1 and size([a^{*}])=1. However, according to specificity, spec([[a^{2}]^{*}])= 2 and spec([a^{*}])=1. Hence, the algorithm selects [[a^{2}]^{*}].
Here is the learning algorithm. Given some number of examples (e.g., e_{1}=[a,a,a,a,b,c] and e_{2}=[a,a,a,b,c]):
The learning algorithm was implemented in Standard ML v.110. Its input are the sequences of proofs constructed in Omega . Its output are method outlines which are passed back to Omega . The algorithm was tested on several examples of proofs and it successfully produced the required method outlines.
In particular, for our example of the residue class theorems above, the sequences of methods identifiers from the proofs of these theorems are as follows:
Due to lack of space the reader is referred for the discussion of some properties of our learning algorithm to [jkpb02tr]. There are some disadvantages to our technique, mostly related to the run time speed of the algorithm relative to the length of the examples considered for learning. The algorithm can deal with relatively small examples, which we encounter in our application domain, in an optimal way. The complexity of the algorithm is exponential in the worst case. Hence, we use some heuristics for large and badly behaved examples [jkpb02tr].
Method outlines that have been learnt so far do not contain all the information which is needed for the proof planner to use them. For instance, they do not specify what the pre and postconditions of methods are, they also do not specify how the number of loop applications of methods is instantiated when used to prove a theorem. Hence, we need to restore the missing information.
For each learnt outline we automatically create a method for which its precondition is fulfilled if there is a sequence of methods that is an instantiation of the method outline, and each method of the sequence is applicable. The postcondition introduces the new open goals and hypotheses resulting from applying the methods of the sequence to the current goal. We will call this kind of method a learnt method.
Since methods in Omega may be very complex, the precondition of a learnt method cannot be extracted from the pre and postconditions of the uninstantiated methods in the method outline. Hence, we actually have to apply a method to produce a proof situation for which we can test the preconditions of the subsequent method in the method outline. That is, we have to perform proof planning guided by the learnt outline.
The precondition test performs a depth first search on the learnt method outline. Besides the choice points of the learnt method, i.e., disjunctions and number of repetitions for "*", we also have to decide to which open goal every single method in the method outline should be applied. Additionally, for methods containing parameters an instantiation has to be chosen. More details on the application test and on the learnt method reuse can be found in [jkpb02tr].
In order to evaluate the success of our approach, we carried out an empirical study in different problem domains on a number of theorems. This test set includes the theorems from which new methods were learnt, but most of them are new and more complex. In particular, we tested our framework on examples of residue classes (e.g., like our running examples), set theoryHere are a set theory theorem and a nontheorem: FORALL x,y,z . ((x union y) intersection z) = (x intersection z) union (y intersection z) and FORALL x,y,z . (x union y) intersection z) = (x union z) intersection (y union z). and group theory.Here are some group theory examples: a o (((a^{1} o b) o (c o d)) o f) = (b o (c o d)) o f and (c o (b o (a^{1} o (a o b^{1})))) o (((d o a) o a^{1}) o f) = c o (d o f). The aim of these experiments was to investigate if the proof planner Omega enhanced with the learnt methods, can perform better than the standard Omega planner.There are several reasons for comparing the performance of Omega with and without a learning capability, rather than with the performance of other learning reasoning systems. First, there are only few proof planners around. In the future, we plan to apply our learning approach to lambdaClam proof planner in order to assess the benefits of our approach in other systems. Second, as discussed in Section 5, little work has been done on applying machine learning techniques to automated theorem provers, hence a direct comparison with another approach is infeasible. The learnt methods were added to the search space in a way that their applicability is checked first, before the existing standard methods.
The measures that we consider are: coverage  the ability to prove new theorems with the help of learnt methods; timing  the time it takes to prove a theorem; proof length  the number of steps in the proof plan; and matchings  the number of all true and false attempts to match candidate methods. Checking the candidate methods that may be applied in the proof, i.e., matchings, is by far the most expensive part of the proof search, and is hence the best measure to indicate the validity of our approach.
Table 1 compares the values of matchings and proof length for the three problem domains.Note that assocz3ztimes and closedz3zplusplus in Table 1 are our second and first running example, respectively. It compares the values for these measures when the planner searches for the proof with the standard set of available methods (column marked with S), and when in addition to these, there are also our newly learnt methods available to the planner (column marked with L). "" means that the planner ran out of resources (four hours of CPU time) and could not find a proof plan.
Theorem  Matchings  Length  
S  L  S  L
 
assocz3ztimes  651  113  63  2 
assocz6ztimes  4431  680  441  2 
average res. class  1362.0  219.5  134.0  2.0

closedz3zplusplus  681  551  49  34 
closedz6zplusplus  3465  2048  235  115 
average res. class  1438.8  918.3  101.0  57.3

average set examples  33.5  12.5  13.0  2.0

average group theory (simple)  94.2  79.0  15.5  8.3

average group theory (complex)    189.6    9.8

The next two theorems (about the closed property) and the average of all theorems of a similar type, can use the choose method. choose can only prove a subpart of a theorem, hence a number of other methods need to be used in addition to choose in order to prove the theorem. Hence, the proofs of this type are longer than the proofs of the former type (that use tryanderror), as is evident also from Table 1. As expected, the improvement in the number of matchings reflects this, and is a factor of two on average. In general, the more complicated the theorem, the better is the improvement made by the availability of the learnt methods.
It is evident from Table 1 that the number of matchings is improved, but it is only reduced by about 15%. We noticed that for some very simple theorems, a larger number of matchings is required if the learnt methods are available in the search space. However, for more complex examples, this is no longer the case, and an improvement is noticed. The reason for this behaviour is that additional methods increase the search space, so when there are only a few ways to apply methods in the case of simple theorems, this causes some overheads. In the case of complex examples, there are many more possible ways to apply methods, hence the presence of complex learnt methods causes small or no overheads, or in fact, a reduced number of matchings.
The success of our approach is also evident from the fact, that when our learnt methods are not available to the planner, then it cannot prove some complex theorems, i.e., the coverage of the system that uses learnt methods is increased. When trying to apply methods such as associativity left or right, for which the planner has no control knowledge about their application, it runs out of resources. Our learnt methods, however, provide control over the way the methods are applied in the proof, and enable a planner to generate a proof plan. Again, the proof length is reduced by using learnt methods, as expected.
On average, the time it took to prove theorems of reside classes with the newly learnt methods was up to 50% shorter than without such methods. For conjectures in set theory the proof search with learnt methods was about 15% shorter. The search in group theory took approximately 100% longer than without the learnt methods. The time results reflect in principle the behaviour of the proof search measured by method matchings, but also contain the overhead due to the current implementation for the reuse of the learnt methods (see Section 3). For example, the current proof situation is copied for the applicability test of the learnt method, and the new open goals and hypotheses resulting from a successful application are copied back into the original proof. This overhead could be reduced in later versions.
In general, the coverage when using learnt methods is improved, which is also indicated by the fact that using learnt methods Omega can prove theorems that it can otherwise not prove.
The reason for the improvements described above is due to the fact that our learnt methods provide a structure according to which the existing methods can be applied, and hence they direct search. This structure also gives a better explanation why certain methods are best applied in particular combinations. For example, the simplification method for group theory examples indicates how the methods about associativity, inverse and identity should be combined together, rather than applied blindly in any possible combination.
Our approach to learning methods is related to techniques for learning macrooperators, which was originally proposed for planning in artificial intelligence [rubykibler89,korf85]. But within theorem proving and proof planning related work is scarce. E.g., some work has been done on applying machine learning techniques to theorem proving, in particular on improving the proof search [fuchs98,schulz00]. However, not much work has concentrated on high level learning of structures of proofs and extending the reasoning primitives within an automated theorem prover.
Silver [pub220] and Desimone [pub321] used precondition analysis which learns new inference schemas by evaluating the pre and postconditions of each inference step used in the proof. A dependency chart between these pre and postconditions is created, and constitutes the pre and postconditions of the newly learnt inference schema. These schemas are syntactically complete proof steps, whereas the Omega methods contain arbitrary function calls which cannot be determined by just evaluating the syntax of the inference steps.
Kolbe, Walter, Melis and Whittle have done related work on the use of analogy [pubmw] and proof reuse [kolbewalther94,kolbe97]. Their systems require a lot of reasoning with one example to reconstruct the features which can then be used to prove a new example. The reconstruction effort needs to be spent in every new example for which the old proof is to be reused. In contrast, we use several examples to learn a reasoning pattern from them, and then with a simple application, without any reconstruction or additional reasoning, reuse the learnt proof method in any number of relevant theorems.
In terms of a learning mechanism, more recent work on learning regular expressions, grammar inference and sequence learning [sungiles00] is related. Learning regular expressions is equivalent to learning finite state automata, which are also recognisers for regular grammars. Muggleton has done related work on grammatical inference methods [muggleton90]. The main difference to our work is that these techniques typically require a large number of examples in order to make a reliable generalisation, or supervision or an oracle which confirms when new examples are representative of the inferred generalisation. Furthermore, these techniques only learn sequences. However, our language is larger than regular grammars as it includes constant repetitions of expressions and expressions represented as trees.
Related is also the work on pattern matching in DNA sequences [brazma94], as in the GENOME project, and some ideas on our learning mechanism have been inspired by this work.
In this paper we described a hybrid system Learnomatic, which is based on the Omega proof planning system enhanced by automatic learning of new proof methods. This is an important advance in addressing such a difficult problem, since it makes first steps in the direction of enabling systems to better their own reasoning power. Proof methods can be either engineered or learnt. Engineering is expensive, since every single new method has to be freshly engineered. Hence, it is better to learn, whereby we have a general methodology that enables the system to automatically learn new methods. The hope is that ultimately, as the learning becomes more complex, the system will be able to find better or new proofs of theorems across a number of problem domains.
There are several limitations of our approach that could be improved in the future. Namely, the learning algorithm may overgeneralise, so we need to examine what are good heuristics for our generalisation and how suboptimal solutions can be improved. In particular, we want to address the question how to deal with noise in the examples. In order to reduce unnecessary steps, the preconditions of the learnt methods would ideally be stronger. Currently, we use an applicability test to search if the preconditions of the method outline are satisfied. In the future, preconditions should be learnt as well. Finally, in order to model the human learning capability in theorem proving more adequately it would be necessary to model how humans introduce new vocabulary for new (emerging) concepts.
A demonstration of Learnomatic implementation can be found on the following web page:
http://www.cs.bham.ac.uk/~mmk/demos/LearnOmatic/ Further information, also with links to papers with more comprehensive references can be found on http://www.cs.bham.ac.uk/~mmk/projects/MethodFormation/