ABSTRACT
Proof planning is a form of theorem proving in which the proving procedure is viewed as a planning process. The plan operators in proof planning are called methods. In this paper we propose a strategy for heuristically restricting the set of methods to be applied in proof search. It is based on the idea that the plausibility of a method can be estimated by comparing the model class of proof lines newly generated by the method with that of the assumptions and of the theorem. For instance, in forward reasoning when a method produces a new assumption whose model class is not a superset of the model class of the given premises, the method will lead to a situation which is semantically not justified and will not lead to a valid proof in later stages. A semantic restriction strategy is to reduce the search space by excluding methods whose application results in a semantic mismatch. A semantic selection strategy heuristically chooses the method that is likely to make most progress towards filling the gap between the assumptions and the theorem. Each candidate method is evaluated with respect to the subset and superset relation with the given premises. All models considered are taken from a finite reference subset of the full model class. In this contribution we present the model-guided approach as well as first experiments with it.
KEYWORDS: proof planning, automated theorem proving, model generation
The most prominent machine-oriented approaches are based on the resolution [Robinson65] or the tableau calculus [Smullyan68]. In resolution, the problem that consists of assumptions and a theorem is transformed into a normal form, the so-called clause normal form, and a proof is searched for on this clause level. The approach consists of firstly efficiently searching big search spaces, and secondly making good heuristic choices, since the problem is known to be intractable for hard problems. More and more powerful systems have been built on such paradigms, e.g. MKRP [EO86], SETHEO [LSBB92], OTTER [McCune94], SPASS [WGR96], and VAMPIRE [RV01].
On the one hand, these systems show remarkable performance on many problems in particular application areas. Even open mathematical problems could be solved with the assistance of automated provers for the first time. Most notably, the problem whether Robbins algebras are Boolean algebras was successfully proved by an automated theorem prover [McCune97]. On the other hand, these methods unfortunately often fail even for problems that mathematicians would consider trivial.
Humans use approaches to theorem proving which drastically differ from the approaches machine-oriented theorem provers use. A deep understanding what these human methods exactly are is unfortunately still missing and will probably be very difficult to obtain in full generality. However, partial insights can be obtained from the reflections by mathematicians themselves, by which they try to explain the way how they find mathematical proofs [Hadamard44,Waerden64]. The extensive work by Pólya on teaching mathematics [Polya45,Polya54,Polya62] is a rich source of examples how mathematicians prove theorems. Furthermore, the field of mathematical reasoning has been investigated in cognitive science [Braine78,JB91]. This all led to very valuable insights and significant steps towards a computer model how humans perform mathematical reasoning. The area in which attempts are made to develop such insights to a computational model of human theorem proving is called human-oriented theorem proving.
Let us look at some differences between the approaches of human-oriented and machine-oriented theorem proving. When humans try to prove theorems, they usually do not transform the whole problem into a normalised form nor do they search for the solution on a logical level. Instead, they search and deduce the conclusion on a more abstract mathematical level. They decide on what proof methodology to use, and as they proceed in a proof attempt, they choose what inference rule to apply. The proof methodologies and inference rules are obtained from either textbooks or previous experience in similar examples. Unlike most machine approaches, these inference rules are not just a set of rules which are fixed once and for all, but any successful proof of a problem can be stored in the mathematicians' memory (in total or in parts) and may be reused as a new inference step for a new problem. Human-oriented theorem proving is an attempt to model (parts of) the human problem solving behaviour. Since it makes use of domain specific reasoning and allows for a hierarchical approach, the exponential explosion of the search space which is typical for machine-oriented theorem proving can be avoided for many practical problems. The most prominent approach to human-oriented theorem proving is proof planning [Bundy88]. In proof planning, the search for a proof is considered as a planning process in which existing plan operators, called methods, are combined to form a full plan which can then be (hopefully) carried through.
A major difference between the machine-oriented way of proving theorems and the way how humans do it is the size of the search spaces involved. Machines can traverse search spaces with millions of nodes with reasonable resources. Humans, however, cannot do that, but have to rely on keeping search spaces small by putting enormous effort on adequate representations (and using re-representations) as well as making use of good heuristics that explore the more promising parts of the search space first. A major challenge for the field of human-oriented theorem proving is to understand better how humans arrive at such heuristics. Up to now such knowledge is only patchy. For instance, heuristic information can be encoded in form of specificity which prefers a specific method over a general one, or in form of an explicit ranking which rates methods independently of the concrete situation. None of these approaches provides a satisfying answer to what method to choose.
Semantic information seems, to a large extent, to be used as a form of heuristics in human problem solving. Checking an argument in one concrete model or a couple of models seems to be a very important reasoning pattern. If the argument holds in a couple of key models it has significantly gained in evidence. If it fails, it is rejected and a different argument must be constructed. We believe that a semantic approach will play a very important role in modelling the human reasoning capabilities in mathematics. The main idea of the work presented here is that the plausibility of a method (or a step) in a proof plan can be estimated by means of comparing models taken from a reference class of models.
The remainder of the paper is organised as follows. In the next section we describe proof planning in more depth. We discuss model based reasoning in section 3. The semantic restriction and selection strategies which are the main contributions are presented in section 4. Section 5 discusses the state of our implementation and some initial results. We conclude in section 6.
Proof planning [Bundy88] is an approach to human-oriented theorem proving. It is based on the idea that the reasoning process for proving mathematical theorems can be viewed as a planning process which employs abstract plan operators, called methods. Assume we want to prove a mathematical theorem (or conclusion) phi from premises A1,... , An. The picture of proof planning is that we have a gap between the assumptions and the theorem, which we try to fill by proof steps. We try to do that by systematically reducing gaps by introducing intermediate assertions. This is done by methods which represent on the one hand program fragments which generate specific proof parts, and on the other hand high-level specifications of the program fragments which make planning possible. In each planning stage, the proof planner selects a method which fills in the gap between premises and conclusion. A proof plan is found when there are no more remaining gaps on an abstract level. A proof is found when the proof plan can be successfully executed. Note that proof plans may fail.
Methods play the most significant role in proof planning. In Bundy's view [Bundy88] a method consists of a tactic, a precondition, and a postcondition. The tactic is a piece of program code that manipulates the actual proof in a controlled way, and the precondition and the postcondition form a specification of the deductive ability of the tactic. In the Omega proof planner, a slightly extended view has been taken in order to allow automated adaptation of methods to new problems. Huang et al. [HKK92,HKRS94,Kerber98] proposed an idea of separating the procedural and declarative knowledge in the tactic part of the methods.
An Omega method has the following slots: declarations, premises, constraints, conclusions, declarative content, and procedural content as displayed in Figure 1. The declarations specify a signature that declares variables of the method. The premises consist of a list of proof line schemata which are used to prove the conclusions. The constraints contain additional restrictions on the premises and the conclusions. The conclusions slot contains the proof lines which are the goal of the proof. The declarative content provides declarative knowledge that is interpreted by the procedural content. And finally, the procedural content is a piece of program code that takes a piece of declarative knowledge and a pointer to the current proof plan tree, and produces a subtree which can be integrated into the current tree.
Mainly two different types of planning strategies, forward and backward planning, are used in proof planning. In forward planning, the planner searches for a method that takes assumptions (axioms, definitions) or already proved intermediate results and produces new facts that can be used as new assertions in the following planning steps. Initially, there is a gap between the initial assumptions A1, ... , An and the theorem Th as shown in Figure 2. The planner starts from the assumptions and proceeds toward the theorem by filling in the gaps between assumptions and the theorem. When the line generated by the method is identical to the theorem, a proof plan is found.
Likewise, in backward planning, a method takes a theorem and produces subgoals sufficient to deduce the theorem. Some of the subgoals may be identical to existing assumptions, but typically some others are new and need to be added as open goals to be proved in later steps. In Figure 3, the planner starts from the open goal (initially, the theorem Th) and works backward towards the assumptions.
The method for mathematical induction is an abstract level method that transforms the problem to a base case and a step case, in which the proof is obtained by recursive applications of induction rules. Note that we describe here only a simplified form. The proof plan for induction was studied in great detail as the first application of proof planning [BvHHS91].
Figure 4 shows the basic structure of the proof plan for induction.
Let us look at a concrete example. With the inductive definition of +:
Although proof planning allows for more abstract proof search, and typically involves smaller search spaces than traditional machine-oriented theorem proving, this does not mean that the standard problems of search do not constitute a problem. Proof planning can be viewed as a process in which one searches iteratively for a method for the simplification of a given problem until no more subproblems are generated in this process. At each stage of the planning process, the procedure takes one method which has the highest priority. When a method leads to a stage in which no further methods are applicable (or promising) but no full plan has been generated, backtracking becomes necessary and another candidate method is tried. In existing proof planners, heuristic selection is typically performed on static information (such as specificity or explicit rank explained in section 4) attached to each method regardless of the concrete problems. More flexible heuristics seem to be necessary.
Typically the planner finds several candidates that seem equally applicable at each stage of the planning process. However, some branches have dead ends from where the planner can not proceed any more. Figure 6 illustrates the backtracking mechanism where the planner chose a method, proceeded two more steps and found that it was blocked.
Although there is no general way to disambiguate good from bad steps for undecidable problems, a key to successful theorem proving - whether it is done by a human being or by a machine - is to minimise wrong choices in search. In the next section we try to show how models can reduce the backtracking in the search for proof plans.
Humans use models in various forms of reasoning. Models at least contribute to the heuristic information which direction the proof procedure should take. This has been taken into account in several traditional machine-oriented theorem provers [SLM94,CP94,KC00], which have incorporated semantic guidance into their proof search. However, since these provers are based on a low-level refutation procedure, the semantic guidance is restricted to choosing clauses that are most likely to contribute to finding a semantic contradiction. There is a significant difference between machine-oriented and human-oriented attitudes towards how to use models to guide the proof search. Mathematicians use semantics not only to check a contradiction but also to get the implications of a problem and to measure the plausibility of the logical consequences. In this section, we introduce a human-oriented view of model based guidance and present how it works in proof planning.
A visual representation of reasoning was proposed in mental models [JB91], in which deductions are modelled by Venn diagram. For instance, we have two premises: all psychologists are experimenters, and all experimenters are skeptics. The first premise gives rise to two diagrams:
We take up this idea of the mental models to mathematical reasoning by approximating the area of a premise with semantics, and use this information as a heuristic guidance which proof planning steps to perform.
Suppose Gamma is a set of assumptions and phi the theorem, (written as Gamma |- phi). In order to limit the model space, we assume a reference set of interpretations R. We define the model sets MGamma := { M in R | M |= Gamma } and Mphi := { M in R | M |= phi }, then it follows from Gamma |= phi that MGamma subset Mphi. Let Gamma consist of assumptions, for instance, A1, A2, ... , An, we get model sets M1, M2, ... , Mn with respect to the model reference class (note, all Mi subset R). The semantic relations presented by A1 & A2 & ... & An |- phi may be described as ( M1 intersect M2 intersect ... intersect Mn ) subset Mphi, as illustrated for n=3 in Figure 7.
From the early days of automated theorem proving [Gelernter59] semantics played a significant role in pruning the search for proofs. This work has been taken up in more recent systems built on traditional theorem provers such as the SCOTT system [SLM94,HS01] and the semantic clause graph procedure [KC00]. Semantic information can indicate what the theory means, which direction the proof is to proceed, and which one of the candidate steps seems to have the most plausibility to lead to the desired result. In the case of the proof planning approach, in each stage of the above planning procedure we can efficiently reduce the search space if we have a mechanism to rule out less promising methods. The key idea of this research is to use semantics to obtain heuristic knowledge for determining the feasibility of a certain method.
Models can provide heuristics about whether a formula generated by a method can be used as a new assumption or open goal, and if so, to what extent it fills the gap between the initial assumptions and the theorem. As mathematicians seem to use semantics in order to obtain heuristics to solve problems, it should also be natural that human-oriented automated theorem provers do so. In each step of the theorem proving process, the proof planner chooses a method that fills in the gap between assumptions and the theorem either by adding a new open goal before the theorem or by adding new assumptions after the initial assumptions. The main idea we introduce here is that the plausibility of a method can be estimated by means of comparing the semantics of new lines generated by the method with those of the assumptions and of the theorem. We extend the idea that any model of the assumptions Gamma should also be a model of the theorem phi to the models of intermediate results as well. All the newly generated lines must satisfy the semantic rules. In forward proof planning, an application of a method adds new lines and may delete old lines from the planning state (the deleted lines are considered to be replaced by the newly generated ones) and the assumption is updated from the initial set Gamma to a new set Gamma'. The models of the intersection of all assumptions should also be models of the theorem. That is, MGamma' subset Mphi must follow from Gamma |- phi ==> Gamma' |- phi, where ==> indicates the transition from one planning state to the next. To put it in different words, any transition for which the semantic restriction MGamma' subset Mphi does not hold cannot lead to a valid proof and can be excluded from the search space. Likewise, in backward proof planning an application of a method adds a new open goal phi'. The models of new open goals should contain all models of the intersection of all assumptions. MGamma subset Mphi' must follow from Gamma |- phi ==> Gamma |- phi'; if not it can be excluded from search.
Proof planning incorporates formalised proof fragments in methods. However, often existing control rules are not sufficient to heuristically choose one method when two or more methods are applicable. In areas in which strong methods are available - like in the area of mathematical induction - this is no major drawback, since the specification is strong enough to restrict the applicability to one or very few methods. In other areas, however, such heuristic knowledge is not given and existing proof planners can easily fall into dead end situations in which no more methods are applicable. In those cases, backtracking has to be done in order to return to a previous state in which other candidate methods were available. As in any search problem, the efficiency of proof planning depends on exploring the more promising parts of the search space first. We need good heuristics in order to evaluate each branch of the search tree. Some heuristic information can be encoded in form of specificity which prefers the application of a specific method to a general one. Another approach is to explicitly rank methods in plan construction, for instance, by a rating, and to apply the applicable method that has the highest rank (measuring its usefulness) independently of the concrete situation.
In this work we use models in proof planning to exclude implausible candidate methods (semantic restriction) and, where there is still more than one method applicable, to estimate which one would fill the gap better (semantic selection).
A semantic restriction strategy is a strategy that rejects any methods whose applications produce an intermediate result with which the semantic relation discussed in section 3.2 is not satisfied. This simple strategy can avoid many useless steps. For example, the search for a proof plan in Omega for {S -> FORALL x Q(x)} |- FORALL x (S -> Q(x)) can be reduced from 10 to only 3 steps with the semantic restriction strategy as we will describe in section 5.
In some cases, it is possible for the application of a method to produce a new assumption with which the theorem cannot be deduced. As an example, suppose premises and a conclusion that satisfy A1, A2, A3 |- phi with M1 intersect M2 intersect M3 subset Mphi. The following method may be applied, which replaces line L1 (indicated by - L1) by line L4 (indicated by + L4). The intermediate result psi from line L4 will participate in the proof as a new assumption in the next stages.
| without | Method 1 | |
| premises | - L1, L2, L3 | |
| conclusions | + L4 | |
| proof schema | L1. H |- A1 | |
| L2. H |- A2 | ||
| L3. H |- A3 | ||
| L4. H |- psi | ||
Suppose the initial premises are A1, A2, A3 and the theorem is phi. We apply Method 2 that uses premises A2 and A3 and adds a new premise A4 in order to derive phi.
| without | Method 2 | |
| premises | L2, L3, + L4 | |
| conclusions | - L5 | |
| proof schema | L2. H |- A2 | |
| L3. H |- A3 | ||
| L4. H |- A4 | ||
| L5. H |- phi | ||
A semantic selection strategy does not only restrict the number of methods in a semantic way, but heuristically chooses one that seems to make most progress towards filling the gap. It evaluates the semantics of each method where all methods seem equally applicable if semantics is not taken into consideration. We propose to distinguish more promising methods from the others by estimating how well they semantically match the open goals with respect to the subset and superset relation with the given premises. In a forward reasoning approach, the methods which restrict the model class best in the direction to the theorem are better. Concretely, if the problem is Gamma |- phi and two methods mu1 and mu2 transform the problem to Gamma1 |- phi and Gamma2 |- phi, respectively, with MGamma subset MGammai and MGammai subset Mphi with i=1 or 2, select MGamma1 rather than MGamma2 if and only if MGamma1 is bigger than MGamma2. (If the sets have the same cardinality, make a random choice.) Likewise in backwards reasoning, a method that produces the smallest model class should be selected.
Even when two or more candidate methods are applicable without semantic conflict, the plausibility of a method may still be assessed by diagrams. Assume the initial assumption set Gamma, assumption set Gamma1 obtained by the application of method mu1, Gamma2 obtained by the application of mu2, and the theorem phi.
The selection strategy in backward proof planning proceeds towards the assumptions. In Figure 10, unlike in forward planning, we choose Gamma1 rather than Gamma2 since it is closer to the initial assumption Gamma and it should be easier to be derived from Gamma.
We have implemented the semantically restricted proof planner based on the Omega system [BCFFx97], which is a multi-initiative theorem proving environment. We use FINDER [Slaney95] as a model generator. The semantic proof planner transforms each proof line to FINDER readable format and generates a finite number of finite model sets.
We suppose the following problem, which is simple enough to illustrate the approach: {S -> FORALL x Q(x)} |- FORALL x ( S -> Q(x) )
In the initial state, there are two methods applicable. The Omega proof planner chooses a backward method mu1 by its existing method selection strategy. However, the application of mu1 will give rise to backtracking.
To verify the first condition, we generate models from the premise S -> FORALL x Q(x) with the domain fixed to D = {0, 1}.
| Model | S | Q(0) | Q(1) |
| m1 | F | F | F |
| m2 | F | T | F |
| m3 | F | F | T |
| m4 | F | T | T |
| m5 | T | T | T |
Similarly, there is another chance to avoid backtracking as shown in Figure 11. As a result of the model-based restriction, the procedure can avoid two backtrackings and find the proof in three steps only.
Although the way humans manage to keep search spaces in theorem proving small has not been fully understood, semantic guidance in form of checking whether certain reasoning steps make sense in well chosen models seems to provide powerful restriction and selection strategies. In this paper we discussed how to apply such strategies to proof planning, which is a human-oriented way of theorem proving.
We have introduced the main ideas of proof planning and how models can help to reduce wrong choices in proof planning. Wrong choices result in backtracking and slow down the process of proof planning. We have demonstrated the approach of semantic restriction with a simple example. One important question left for future investigation is how to select informative models. Currently we use the FINDER model generator for the generation of a reference set of models. While this is already informative (and reduces search) in some examples like the one we have looked at in section 5, we want to improve on it in several ways. In particular, we want to keep the reference set of models small and informative at the same time. In order to do so other approaches for using models are necessary. We think that using typical models [Kerber93a] is promising for this purpose.
For the future we plan to deepen our understanding by more experiments. In particular we want to investigate different approaches to semantic selection strategies.