ABSTRACT
In this contribution we present a variant of a resolution theorem prover which selects resolution steps based on the proportion of models a newly generated clause satisfies compared to all models given in a reference class. This reference class is generated from a subset of the initial clause set. Since the empty clause does not satisfy any models, preference is given to such clauses which satisfy few models only. Because computing the number of models is computationally expensive on the one hand, but will remain almost unchanged after the application of one single resolution step on the other hand, we adapt Kowalski's connection graph method to store the number of models at each link.
Resolution [Robinson65] is probably the best developed approach to automated theorem proving, in which the problem is transformed into a clausal form and a proof is searched for on this clause level. Many powerful systems have been built on this paradigm, e.g. MKRP [EO86], Otter [McCune94], Spass [WGR96], and Vampire [RV01]. On the one hand, these systems show remarkable performance on many problems in particular application areas. Recently a variant of Otter successfully solved the Robbins problem [McCune97] that had remained as an open problem for several decades. On the other hand, since these methods depend largely on blind search, exponential explosion is unavoidable. The key technique consists of efficiently searching through a big search space and making good heuristic choices.
Semantic information seems, to a large extent, to be used as a form of heuristic knowledge in human problem solving. Therefore it is natural to think that automated theorem provers can also benefit from the use of semantic information. Some work has been done to integrate semantic information into resolution theorem proving. Slaney et al. [SLM94,HS01] developed the SCOTT-system which combines the model generator FINDER [Slaney94] and the resolution theorem prover Otter. To speak in a simplified way, SCOTT extends the set of support strategy by the introduction of a semantic satisfiability check, which is based on iterated model generation. It could be shown that this approach makes a significant improvement for some examples. Plaisted et al. [CP94,PZ00] implemented a theorem prover using hyper-linking techniques [LP92] with the concept of semantic guidance.
The connection graph proof procedure was first introduced by Kowalski [Kowalski75]. The main idea is to explicitly represent all resolution possibilities as links in a graph between literals, where the links store not only the applicability but additionally the most general unifier for the application. MKRP is a big theorem proving system which was built in the 1970's and 1980's based on the connection graph method. For a recent overview on the clause graph procedure see [SW02]. Independently of open theoretical questions like completeness investigated by Eisinger [Eisinger91] the method has not attracted much attention in more recent resolution theorem provers since the computational overhead to explicitly represent the unifiers in a graph does not pay off. Other techniques like indexing are more important for efficiency. In other words the graph is used to store information which can be more cheaply computed from first principles. In particular, the heuristic choice on which link to resolve only marginally makes use of the links.
The approach presented here can be seen as an attempt to use the connection graph in resolution theorem proving in a more promising way, namely heuristic guidance is provided by the number of models of clauses. First a number of models is generated from the given theory, then a clause graph is constructed out of the set of all clauses, and finally a proof is searched for by heuristic selection of the links. The heuristic selects a link that is most promising in the sense that it has a minimal number of models which fulfil its offspring. Our prototype implementation is built using KEIM [HKKMx94b]. Note that an adaptation of the semantic guidance presented in this paper can be applied to clause selection in other resolution theorem provers as well. We chose the clause graph as it provides a convenient data structure to store the relevant information.
The clause graph represents the set of clauses as a graph in which each node corresponds to a literal. The graph consists of one or more groups of literals and each group is a clause. Two complementary unifiable literals are connected by a link which indicates the unification possibility between the literals.
Suppose we have a clause set {NOT P(a), P(x) | NOT Q(x) | R(x), Q(a) | NOT R(y)}, the set has 3 clauses and 6 literals. Since NOT P(a) and P(x) can be resolved by a common unifier, they are linked together.
Since the link resolved upon is deleted, the clause graph now contains two pure literals, R(x) and NOT R(y), which have no links to others. The clauses containing pure literals can be removed from the clause graph, since they cannot have the empty clause as their successor. When a clause is removed, all links connected to the clause are also removed as no links can exist without nodes. The removal of a link may produce a further pure literal in the clause connected to the opposite end, which can also be removed. In fact, an application of a resolution step can cause a chain reaction of reductions, which in extreme cases can lead to the collapse of the whole graph, showing that the original problem was satisfiable. This is the case for the graph in Figure 2.
The semantics of a set of first-order formulas is represented by a pair M =(D, I), consisting of a domain and an interpretation. The domain D is a nonempty set of objects, and the interpretation I is a mapping from the terms of the language to values in the domain, and formulas are mapped to truth values. We say M is a model of a formula (or formula set) if the formula is (or all formulas in the formula set are) evaluated to true.
Assume we have a first-order problem given by a formula set Gamma and formula phi for which we want to show that phi follows from Gamma (i.e., Gamma |= phi ). Furthermore we assume that the formula set Gamma is consistent, that is, that it has models. In resolution theorem proving phi is negated and it is proved that Gamma union {NOT phi } is unsatisfiable, that is, has no model.
For our procedure we make the assumption - as in set of support - that the background theory Gamma is consistent. Hence it has a model. Furthermore the method as described below makes use of the assumption that there are finite models for Gamma since we make use of a model generator which generates finite models (If no finite models are found, the method reduces to the standard connection graph method). This is, however, an assumption that can be relaxed if we proceed to a model generator which can generate infinite models. Anyway, we think that the class of problems, for which finite models are sufficient, is rich enough that it deserves investigation.
Slaney's FINDER system [Slaney94] is a model generator which fixes the domain to a specific number of objects and then performs an exhaustive search in the space of functions and constants which can be satisfied in that domain. We use FINDER as a model generator to generate a set of models M = { m1, m2,... , mn } from a formula set Gamma ={ phi 1, phi 2,... , phi m }.
FINDER generates models with a fixed cardinality of the domain D. It is possible to give lower and upper bounds for the cardinality of D. We may either take all models found in a certain bound or limit the result to a certain number of models we are interested in. We have experimented which number of models should be considered best. In our experiments, as described in section 7, we use a cardinality of the reference set of models M between 40 and 100.
Let Gamma be a set of assumptions and phi be the conclusion, such that Gamma |= phi , and let Gamma * and { NOT phi } * be the sets of clauses obtained from normalisation of Gamma and { NOT phi }, respectively. The resolution procedure searches for a contradiction (an empty clause) from Gamma * union { NOT phi } * . The finite model set M = { m1, m2, ... , mn | mi |= Gamma * } is generated from the assumptions Gamma * with the interpretation domain fixed to a finite set of objects.Note that the models have to interpret any additional symbols in phi * as well. Let C be the corresponding initial clause set. C = Gamma * union { NOT phi } * We use Im to denote the interpretation of a clause C (C in C) using a model m. Im: C ==> { T, F } MC is a subset of M such that MC = { mi | Imi: C -> T, mi in M }. Suppose two clauses C1 and C2 (C1, C2 in C) have literals unifiable with each other. Let C' be the new clause as a result of the application of resolution on C1, C2. A link sigma is defined as sigma = ( C1, C2, C', |MC'| ), a set consisting of the references to two parents clauses, the resolvent, and the number of models in which the resolvent is evaluated to true. The clause graph G = (C, L) consists of a set of clauses C and a set of links L for all unifiable pairs of literals.
In the set of links L = { sigma1, sigma2, ... , sigman }, we choose a sigmamin with the least number of models |MC'|. The resolvent C' is added to the clause set, sigmamin is removed from the link set, and all links connected to C1 or C2 are potentially copied with substitutions [C'/C1] and [C'/C2] applied. Cnew -> C union { C'(sigmamin) } Lnew -> ( L - sigmamin ) union { sigma | sigmaC1, sigmaC2, [C'/C1], [C'/C2] } G -> (Cnew, Lnew)
In summary, our procedure is as follows. If we want to show Gamma |= phi with phi as theorem from which we generate the set of support, we first generate a set of models from Gamma * with a model generator, concretely FINDER. The next steps are to construct a clause graph from the set of clauses of Gamma * union {NOT phi } * , applying resolution rules to all connected literals in the graph, producing resolvents, evaluating these in the set of models and counting the number of models in which each resolvent is true. We use the links in the clause graph to store the number of models in order not to have to recompute them in each cycle.Just to store the numbers does not make use of the full potential of the connection graph method. In a variant, it would be possible to store the names of the models in each link. This would improve the performance, since better use can be made of the inheritance mechanism. Only those models of the resolvent of the parent link have to be checked and are potentially inherited. Furthermore, the total number of models could be reduced by taking into the reference set only pairwise non-isomorphic models. The selection of the link on which the next step is to be performed follows a heuristic which is based on the number of models in which the resolvent is true.We have also experimented with a variant in which the selection makes a probabilistic choice with a bias towards heuristically promising links, in order to obtain probabilistic completeness. We have no space to describe this here.
The procedure in pseudo code is displayed in Figure 3.
INPUT: Gamma * ;;; hypotheses
phi * ;;; conclusion M := GenerateModels( Gamma * );
G := ConstructGraph( Gamma * union {NOT phi } * );
FOR EVERY L a link in G let
N := number of models of L in M and insert Label(L,N);
REPEAT
Select(L,G);
Res := Resolve(L);
Insert(Res,G);
FOR EVERY L is a link connected to Res let
N := number of models of L in M and insert Label(L,N);
Simplify
UNTIL empty clause derived OR G empty
As an example, let us look at the problem to show that from "orthogonal is symmetric" follows that "parallel is also symmetric" (P stands for parallel and Q for orthogonal):
| Gamma * : | {NOT P(x,y) | Q(x,f(x,y)), |
| NOT P(x,y) | Q(f(x,y),y), | |
| P(x,y) | NOT Q(x,z) | NOT Q(z,y), | |
| NOT Q(x,y) | Q(y,x) } | |
| {NOT phi } * : | { P(a,b), |
| NOT P(b,a) } |
| Models | m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | ... | m100 | m101 | m102 | ... | m239 | m240 |
| P(0,0) | F | F | F | F | F | F | F | F | ... | T | T | T | ... | T | T |
| P(0,1) | F | F | F | F | F | F | F | F | ... | F | F | F | ... | T | T |
| P(1,0) | F | F | F | F | F | F | F | F | ... | F | F | F | ... | T | T |
| P(1,1) | F | F | F | F | F | F | F | F | ... | T | T | T | ... | T | T |
| Q(0,0) | F | F | F | F | F | F | F | F | ... | F | F | F | ... | T | T |
| Q(0,1) | F | F | F | F | F | F | F | F | ... | T | T | T | ... | T | T |
| Q(1,0) | F | F | F | F | F | F | F | F | ... | T | T | T | ... | T | T |
| Q(1,1) | F | F | F | F | F | F | F | F | ... | F | F | F | ... | T | T |
| f(0,0) | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | ... | 1 | 1 | 1 | ... | 1 | 1 |
| f(0,1) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 1 | ... | 1 | 1 |
| f(1,0) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | ... | 1 | 1 |
| f(1,1) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | ... | 1 | 1 |
| a | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | ... | 1 | 0 | 0 | ... | 1 | 1 |
| b | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | ... | 1 | 0 | 1 | ... | 0 | 1 |
Although the semantic guidance results in a smaller search space, it may be expensive to maintain the full set of original models and check every link with respect to all models, of which there are often thousands or more, even with the domain cardinality fixed to 2. Hence it is advisable to keep the reference set manageable in order to make the approach practically feasible.
Table 8 shows the number of models and the number of steps for the example taken from section 5. The second column shows the factor of n which indicates that every nth model is taken from the original model class in order to form a reference set.
| Models | Factor | Steps |
| 240 | 1 | 6 |
| 120 | 2 | 6 |
| 80 | 3 | 6 |
| 60 | 4 | Time out |
| 48 | 5 | 6 |
| 40 | 6 | 6 |
| 30 | 8 | Time out |
| 24 | 10 | 8 |
| 20 | 12 | 11 |
| 15 | 16 | Time out |
| 12 | 20 | Time out |
| 10 | 24 | Time out |
One interesting feature is that the models do not provide good heuristics if the factor is 4, 8 or 16. In this particular example, as we have seen in the table of models, the assumption and conclusion have two Skolem constants a and b. The interpretations of these constants are the same in every 4th model if the domain D is fixed to {0, 1}, which means that if we take every nth model where n is 4i (i integer), both a and b are always interpreted as 1. In the experiments we avoided non-informative subsets of the reference models by choosing the factor to be an odd prime number, such as 7, 11, 13, 17, 19, etc. or by making a random selection.
So far all model sets used in our examples were generated with the domain cardinality fixed to 2. Typically, a bigger cardinality makes each model more informative, and models of cardinality 2 cannot be considered as representative for all finite models. However, as our experiments show, even the class of two element models often bears useful semantic information from which good heuristics for our purpose can be derived. While it is easy to use models with larger cardinality in our approach and to choose randomly a feasible set from all models, the model generation takes typically much longer for higher cardinalities. In the example above, with a cardinality of 3, the number of models is 939078, and with 4 even 4194304. In the following we fix the domain cardinality for our experiments to 2.
| SOS | SCG | Otter | ||||
| Original | Reduced | |||||
| Problem | Steps | Models | Steps | Models | Steps | Steps |
| Par/Orth | 82 | 240 | 6 | 40 | 6 | 17 |
| Pelletier 35 | 0 | 4096 | 0 | 40 | 0 | 0 |
| Pelletier 36 | 4 | 592 | 7 | 53 | 7 | 11 |
| Pelletier 37 | 3 | 664320 | timeout | 67 | 3 | 3 |
| Pelletier 38 | timeout | 4194304 | timeout | 81 | timeout | timeout |
| Pelletier 39* | 5 | 32 | 0 | 32 | 0 | 2 |
| Pelletier 40* | timeout | 128 | 10 | 42 | 14 | 48 |
| Pelletier 41* | 9 | 18 | 2 | 18 | 2 | 2 |
| Pelletier 42* | timeout | 128 | 6 | 42 | 6 | 17 |
| Pelletier 43* | timeout | 640 | 101 | 91 | 101 | 138 |
| Pelletier 44 | 8 | 2336 | 16 | 75 | 18 | 10 |
| Pelletier 45 | timeout | 181760 | timeout | 86 | timeout | 28 |
| Pelletier 46 | timeout | 29968 | timeout | 80 | timeout | 52 |
| Pelletier 47 | timeout | - | - | - | - | timeout |
Although in some examples like Problem 36 and 44 the semantic clause graph prover needs more steps because the problems are relatively easy and the proof may be found in the first several steps by chance with the set of support strategy, in other examples like Problem 39, 40, 41, 42 and 43 the semantic information provides good heuristics for guiding the search. It significantly reduces the number of steps required to search for the proof over the set of support approach.
The right most column is the number of clauses that Otter generates. Please note that our implementation is only a prototype and wants to show the potential of the approach. On the one hand, the current implementation is based on the Lisp extension KEIM [HKKMx94b] and is not a very fast system. Furthermore, semantic guidance is the only search control it has, while Otter employs a variety of sophisticated search control mechanisms. On the other hand, each link in SCG already contains a resolvent rather than a unifier. For this reason, a comparison of the run time behaviour of the systems would not be informative. In order to make a fair comparison with the Otter prover, it would be necessary to integrate the approach directly in Otter.
We have described the semantic clause graph prover and presented experimental results. Our approach differs from other semantically guided provers in that it generates models only once at the beginning and uses them during the rest of the procedure, while other semantic approaches like SCOTT [SLM94] generate models not only from the initial clauses but also from new ones. This is an advantage of our approach since in general model generation is computationally much more expensive than checking. Of course, we lose SCOTT's advantage to have a model set which intimately reflects the current state of the proof search.
The performance of semantically guided resolution depends to a great degree on the availability of good models. Our experiments which make use of two-element models show that even there valuable heuristic information can be obtained. To which degree other model classes, like non-isomorphic models, are more promising or not has to be studied for different example classes. Semantic guidance seems to be a standard technique humans use when searching for proofs. Rather than generating models from scratch, they often use typical examples. It remains an open question whether and to which extent this is necessary in our context. It may also be important to use infinite models as investigated by Caferra and Peltier [CP98].
The semantic clause graph prover can be extended by adding a pre-processor of semantic refinements of assumption clauses [Choi02a]. The idea is to use models to guide the search, not as in other approaches that backwardly guide the search for an empty clause between the assumptions and the negated conclusion, but forwardly rewrite the assumptions to a form that is semantically closer to the conclusion. Once the potential of semantic guidance is better understood, the heuristic information can be used alongside other types of heuristic information, which constitutes the strength of up-to-date systems.