Semantic Selection for Resolution in Clause Graphs

Seungyeob Choi

Manfred Kerber

School of Computer Science
The University of Birmingham
Birmingham B15 2TT, England
http://www.cs.bham.ac.uk/~mmk
The second author likes to thank Norman Foo and the Knowledge Systems Group at the AI Department of the University of New South Wales for their hospitality.

ABSTRACT

In this contribution we present a variant of a resolution theorem prover which selects resolution steps based on the proportion of models a newly generated clause satisfies compared to all models given in a reference class. This reference class is generated from a subset of the initial clause set. Since the empty clause does not satisfy any models, preference is given to such clauses which satisfy few models only. Because computing the number of models is computationally expensive on the one hand, but will remain almost unchanged after the application of one single resolution step on the other hand, we adapt Kowalski's connection graph method to store the number of models at each link.

1.  Introduction

Resolution [Robinson65] is probably the best developed approach to automated theorem proving, in which the problem is transformed into a clausal form and a proof is searched for on this clause level. Many powerful systems have been built on this paradigm, e.g. MKRP [EO86], Otter [McCune94], Spass [WGR96], and Vampire [RV01]. On the one hand, these systems show remarkable performance on many problems in particular application areas. Recently a variant of Otter successfully solved the Robbins problem [McCune97] that had remained as an open problem for several decades. On the other hand, since these methods depend largely on blind search, exponential explosion is unavoidable. The key technique consists of efficiently searching through a big search space and making good heuristic choices.

Semantic information seems, to a large extent, to be used as a form of heuristic knowledge in human problem solving. Therefore it is natural to think that automated theorem provers can also benefit from the use of semantic information. Some work has been done to integrate semantic information into resolution theorem proving. Slaney et al. [SLM94,HS01] developed the SCOTT-system which combines the model generator FINDER [Slaney94] and the resolution theorem prover Otter. To speak in a simplified way, SCOTT extends the set of support strategy by the introduction of a semantic satisfiability check, which is based on iterated model generation. It could be shown that this approach makes a significant improvement for some examples. Plaisted et al. [CP94,PZ00] implemented a theorem prover using hyper-linking techniques [LP92] with the concept of semantic guidance.

The connection graph proof procedure was first introduced by Kowalski [Kowalski75]. The main idea is to explicitly represent all resolution possibilities as links in a graph between literals, where the links store not only the applicability but additionally the most general unifier for the application. MKRP is a big theorem proving system which was built in the 1970's and 1980's based on the connection graph method. For a recent overview on the clause graph procedure see [SW02]. Independently of open theoretical questions like completeness investigated by Eisinger [Eisinger91] the method has not attracted much attention in more recent resolution theorem provers since the computational overhead to explicitly represent the unifiers in a graph does not pay off. Other techniques like indexing are more important for efficiency. In other words the graph is used to store information which can be more cheaply computed from first principles. In particular, the heuristic choice on which link to resolve only marginally makes use of the links.

The approach presented here can be seen as an attempt to use the connection graph in resolution theorem proving in a more promising way, namely heuristic guidance is provided by the number of models of clauses. First a number of models is generated from the given theory, then a clause graph is constructed out of the set of all clauses, and finally a proof is searched for by heuristic selection of the links. The heuristic selects a link that is most promising in the sense that it has a minimal number of models which fulfil its offspring. Our prototype implementation is built using KEIM [HKKMx94b]. Note that an adaptation of the semantic guidance presented in this paper can be applied to clause selection in other resolution theorem provers as well. We chose the clause graph as it provides a convenient data structure to store the relevant information.

2.  The Clause Graph Procedure

The clause graph represents the set of clauses as a graph in which each node corresponds to a literal. The graph consists of one or more groups of literals and each group is a clause. Two complementary unifiable literals are connected by a link which indicates the unification possibility between the literals.

Suppose we have a clause set {NOT  P(a), P(x)  |  NOT  Q(x)  |  R(x), Q(a)  |  NOT  R(y)}, the set has 3 clauses and 6 literals. Since NOT  P(a) and P(x) can be resolved by a common unifier, they are linked together.

1. Example of an initial clause graph

2. Clause graph after one step

We select one of the links in the graph, apply the resolution rule, remove the link, create the new clause which may inherit literals from both ancestors, and insert the new links for this new clause. The new links are a subset of the links of the parent clauses. For instance from the graph in Figure 1 we obtain by resolving on the rightmost link between R(x) and NOT  R(y) the clause graph (unifiers not displayed) in Figure 2.

Since the link resolved upon is deleted, the clause graph now contains two pure literals, R(x) and NOT  R(y), which have no links to others. The clauses containing pure literals can be removed from the clause graph, since they cannot have the empty clause as their successor. When a clause is removed, all links connected to the clause are also removed as no links can exist without nodes. The removal of a link may produce a further pure literal in the clause connected to the opposite end, which can also be removed. In fact, an application of a resolution step can cause a chain reaction of reductions, which in extreme cases can lead to the collapse of the whole graph, showing that the original problem was satisfiable. This is the case for the graph in Figure 2.

3.  Model Generation and Semantically Guided Resolution

The semantics of a set of first-order formulas is represented by a pair M =(D, I), consisting of a domain and an interpretation. The domain D is a nonempty set of objects, and the interpretation I is a mapping from the terms of the language to values in the domain, and formulas are mapped to truth values. We say M is a model of a formula (or formula set) if the formula is (or all formulas in the formula set are) evaluated to true.

Assume we have a first-order problem given by a formula set Gamma and formula phi for which we want to show that phi follows from Gamma (i.e., Gamma |= phi ). Furthermore we assume that the formula set Gamma is consistent, that is, that it has models. In resolution theorem proving phi is negated and it is proved that Gamma  union {NOT  phi } is unsatisfiable, that is, has no model.

For our procedure we make the assumption - as in set of support - that the background theory Gamma is consistent. Hence it has a model. Furthermore the method as described below makes use of the assumption that there are finite models for Gamma since we make use of a model generator which generates finite models (If no finite models are found, the method reduces to the standard connection graph method). This is, however, an assumption that can be relaxed if we proceed to a model generator which can generate infinite models. Anyway, we think that the class of problems, for which finite models are sufficient, is rich enough that it deserves investigation.

Slaney's FINDER system [Slaney94] is a model generator which fixes the domain to a specific number of objects and then performs an exhaustive search in the space of functions and constants which can be satisfied in that domain. We use FINDER as a model generator to generate a set of models M = { m1, m2,... , mn } from a formula set Gamma ={ phi 1, phi 2,... , phi m }.

FINDER generates models with a fixed cardinality of the domain D. It is possible to give lower and upper bounds for the cardinality of D. We may either take all models found in a certain bound or limit the result to a certain number of models we are interested in. We have experimented which number of models should be considered best. In our experiments, as described in section 7, we use a cardinality of the reference set of models M between 40 and 100.

4.  The Clause Graph with Semantic Guidance

Let Gamma be a set of assumptions and phi be the conclusion, such that Gamma |= phi , and let Gamma * and { NOT  phi } * be the sets of clauses obtained from normalisation of Gamma and { NOT  phi }, respectively. The resolution procedure searches for a contradiction (an empty clause) from Gamma *  union  { NOT  phi } * . The finite model set M = { m1, m2, ... , mn | mi |= Gamma * } is generated from the assumptions Gamma * with the interpretation domain fixed to a finite set of objects.Note that the models have to interpret any additional symbols in phi * as well. Let C be the corresponding initial clause set. C = Gamma *  union  { NOT  phi } * We use Im to denote the interpretation of a clause C (C in C) using a model m. Im: C ==> { T, F } MC is a subset of M such that MC = { mi | Imi: C  ->  T, mi in M }. Suppose two clauses C1 and C2 (C1, C2 in C) have literals unifiable with each other. Let C' be the new clause as a result of the application of resolution on C1, C2. A link sigma is defined as sigma = ( C1, C2, C', |MC'| ), a set consisting of the references to two parents clauses, the resolvent, and the number of models in which the resolvent is evaluated to true. The clause graph G = (C, L) consists of a set of clauses C and a set of links L for all unifiable pairs of literals.

In the set of links L = { sigma1, sigma2, ... , sigman }, we choose a sigmamin with the least number of models |MC'|. The resolvent C' is added to the clause set, sigmamin is removed from the link set, and all links connected to C1 or C2 are potentially copied with substitutions [C'/C1] and [C'/C2] applied. Cnew -> C  union  { C'(sigmamin) } Lnew -> ( L - sigmamin )  union  { sigma | sigmaC1, sigmaC2, [C'/C1], [C'/C2] } G -> (Cnew, Lnew)

In summary, our procedure is as follows. If we want to show Gamma |= phi with phi as theorem from which we generate the set of support, we first generate a set of models from Gamma * with a model generator, concretely FINDER. The next steps are to construct a clause graph from the set of clauses of Gamma *  union {NOT  phi } * , applying resolution rules to all connected literals in the graph, producing resolvents, evaluating these in the set of models and counting the number of models in which each resolvent is true. We use the links in the clause graph to store the number of models in order not to have to recompute them in each cycle.Just to store the numbers does not make use of the full potential of the connection graph method. In a variant, it would be possible to store the names of the models in each link. This would improve the performance, since better use can be made of the inheritance mechanism. Only those models of the resolvent of the parent link have to be checked and are potentially inherited. Furthermore, the total number of models could be reduced by taking into the reference set only pairwise non-isomorphic models. The selection of the link on which the next step is to be performed follows a heuristic which is based on the number of models in which the resolvent is true.We have also experimented with a variant in which the selection makes a probabilistic choice with a bias towards heuristically promising links, in order to obtain probabilistic completeness. We have no space to describe this here.

The procedure in pseudo code is displayed in Figure 3.

INPUT:  Gamma  *    ;;; hypotheses
        phi  *    ;;; conclusion

M := GenerateModels( Gamma * ); G := ConstructGraph( Gamma *  union {NOT  phi } * ); FOR EVERY L a link in G let N := number of models of L in M and insert Label(L,N); REPEAT Select(L,G); Res := Resolve(L); Insert(Res,G); FOR EVERY L is a link connected to Res let N := number of models of L in M and insert Label(L,N); Simplify UNTIL empty clause derived OR G empty

3. Pseudo code for the semantic clause graph procedure

An abstract view of the data flow can be found as a flow chart in Figure 4. The following is an abstract description of the whole system. The most important parts are the model generator, the initialising procedure, and the selection procedure. The model generator takes the input theory and generates models of it. The initialising procedure builds the initial clause graph with literal nodes and links; it calculates semantic plausibility for all links and stores them in the links. The selection procedure chooses the links on which to operate next.

4. Proving procedure flowchart

5.  An Example Problem

As an example, let us look at the problem to show that from "orthogonal is symmetric" follows that "parallel is also symmetric" (P stands for parallel and Q for orthogonal):

{FORALL x FORALL y ( P(x,y)  <->  EXISTS z ( Q(x,z)  &  Q(z,y) )),
FORALL x FORALL y ( Q(x,y)  ->  Q(y,x) )}
|= FORALL x FORALL y ( P(x,y)  ->  P(y,x) )
To prove the consequence relation by resolution we assume the first set Gamma = {(FORALL x FORALL y ( P(x,y)  <->  EXISTS z ( Q(x,z)  &  Q(z,y) )))  &  (FORALL x FORALL y ( Q(x,y)  ->  Q(y,x) )) } and negate the second part phi = FORALL x FORALL y ( P(x,y)  ->  P(y,x) ). By normalisation we get:
Gamma * :   {NOT  P(x,y)  |  Q(x,f(x,y)),
  NOT  P(x,y)  |  Q(f(x,y),y),
  P(x,y)  |  NOT  Q(x,z)  |  NOT  Q(z,y),
  NOT  Q(x,y)  |  Q(y,x) }
{NOT  phi } * :   { P(a,b),
  NOT  P(b,a) }

If we fix the cardinality to 2, FINDER selects an interpretation domain to D = { 0,1 } and generates models in Table 5 from Gamma * .

Models   m1   m2   m3   m4   m5   m6   m7   m8   ...   m100   m101   m102   ...   m239   m240
P(0,0)   F   F   F   F   F   F   F   F   ...   T   T   T   ...   T   T
P(0,1)   F   F   F   F   F   F   F   F   ...   F   F   F   ...   T   T
P(1,0)   F   F   F   F   F   F   F   F   ...   F   F   F   ...   T   T
P(1,1)   F   F   F   F   F   F   F   F   ...   T   T   T   ...   T   T
Q(0,0)   F   F   F   F   F   F   F   F   ...   F   F   F   ...   T   T
Q(0,1)   F   F   F   F   F   F   F   F   ...   T   T   T   ...   T   T
Q(1,0)   F   F   F   F   F   F   F   F   ...   T   T   T   ...   T   T
Q(1,1)   F   F   F   F   F   F   F   F   ...   F   F   F   ...   T   T
f(0,0)   0   0   0   0   1   1   1   1   ...   1   1   1   ...   1   1
f(0,1)   0   0   0   0   0   0   0   0   ...   0   1   1   ...   1   1
f(1,0)   0   0   0   0   0   0   0   0   ...   0   0   0   ...   1   1
f(1,1)   0   0   0   0   0   0   0   0   ...   0   0   0   ...   1   1
a   0   0   1   1   0   0   1   1   ...   1   0   0   ...   1   1
b   0   1   0   1   0   1   0   1   ...   1   0   1   ...   0   1
5. Models generated by FINDER

The number of models generated from Gamma is 240. The procedure is now set up to count for each possible step in the calculus (resolution, factorisation) how many interpretations in M are models of the resulting clause. Figure 6 shows the clause graph labelled with semantic information and three links (Link-1, Link-2, Link-3) have fewer models than others do.

6. Initial clause graph

As shown in Figure 7 the application of resolution to Link-1, Link-2 and Link-3 give rise to Resolvent-1, Resolvent-2, and Resolvent-3, respectively. Those resolvents inherit links from their parents. While the number of models of each link needs to be recalculated, this is computationally not too expensive compared to other forms of heuristics. Furthermore as remarked in footnote above the approach can be refined to minimise this by making stronger usage of the inheritance. All the models of an inherited link must be a subset of the model set of the original link of the parent clause, which makes recalculation even easier. In Figure 7, the model numbers of the inherited links are 128, 216 and 224. If there is more than one link with the same minimal number of models, the one that has been created first is selected. Two clauses P(a,b) and NOT  P(b,a) now have pure literals and are removed from the graph.

7. Clause graph after 3 steps

In Figure 7, there are three new resolvents and ten new links. We display only the number of models for the ten newly generated links. Six of the links have 128 models, two have 216, and two have 224. So far the smallest number is still 128, and Link-4 is selected because it was the one created first. When we resolve on Link-4, five new links are created. One of them has only 112 models which is the smallest so far. We select it and apply resolution. The new resolvent also has a link with 112 models and it should be selected again. After selecting it and making another new resolvent, three new links are made and they have 128, 112, and 0 models, respectively. The link of 0 models is between the clauses NOT  Q(f(a,b),b) and Q(f(a,b),b) and produces the empty clause as a result of a resolution step.

6.  Considerations into the Class of Reference Models

Although the semantic guidance results in a smaller search space, it may be expensive to maintain the full set of original models and check every link with respect to all models, of which there are often thousands or more, even with the domain cardinality fixed to 2. Hence it is advisable to keep the reference set manageable in order to make the approach practically feasible.

Table 8 shows the number of models and the number of steps for the example taken from section 5. The second column shows the factor of n which indicates that every nth model is taken from the original model class in order to form a reference set.

Models   Factor   Steps
240   1   6
120   2   6
80   3   6
60   4   Time out
48   5   6
40   6   6
30   8   Time out
24   10   8
20   12   11
15   16   Time out
12   20   Time out
10   24   Time out

8. Reduced set of reference models

One interesting feature is that the models do not provide good heuristics if the factor is 4, 8 or 16. In this particular example, as we have seen in the table of models, the assumption and conclusion have two Skolem constants a and b. The interpretations of these constants are the same in every 4th model if the domain D is fixed to {0, 1}, which means that if we take every nth model where n is 4i (i integer), both a and b are always interpreted as 1. In the experiments we avoided non-informative subsets of the reference models by choosing the factor to be an odd prime number, such as 7, 11, 13, 17, 19, etc. or by making a random selection.

So far all model sets used in our examples were generated with the domain cardinality fixed to 2. Typically, a bigger cardinality makes each model more informative, and models of cardinality 2 cannot be considered as representative for all finite models. However, as our experiments show, even the class of two element models often bears useful semantic information from which good heuristics for our purpose can be derived. While it is easy to use models with larger cardinality in our approach and to choose randomly a feasible set from all models, the model generation takes typically much longer for higher cardinalities. In the example above, with a cardinality of 3, the number of models is 939078, and with 4 even 4194304. In the following we fix the domain cardinality for our experiments to 2.

7.  Experiments on Pelletier Examples

  SOS SCG   Otter
  Original Reduced  
Problem   Steps   Models   Steps   Models   Steps   Steps
Par/Orth   82   240   6   40   6   17
Pelletier 35   0   4096   0   40   0   0
Pelletier 36   4   592   7   53   7   11
Pelletier 37   3   664320   timeout   67   3   3
Pelletier 38   timeout   4194304   timeout   81   timeout   timeout
Pelletier 39*   5   32   0   32   0   2
Pelletier 40*   timeout   128   10   42   14   48
Pelletier 41*   9   18   2   18   2   2
Pelletier 42*   timeout   128   6   42   6   17
Pelletier 43*   timeout   640   101   91   101   138
Pelletier 44   8   2336   16   75   18   10
Pelletier 45   timeout   181760   timeout   86   timeout   28
Pelletier 46   timeout   29968   timeout   80   timeout   52
Pelletier 47   timeout   -   -   -   -   timeout

SOS, SCG: The number of steps the semantic clause graph procedure takes.
Otter: The number of clauses Otter generates.
*: The semantic clause graph takes fewer steps than the set of support.
-: means that the model generation itself took longer than 12 hours.

9. Experiments on Pelletier examples

We have tested our semantic clause graph theorem prover (SCG) with full predicate logic problems from Pelletier's examples [Pelletier86]. Table 9 shows how many steps the prover takes to prove each exampleSince SCG uses a different algorithm from Otter, we apply different measurements - the number of clauses generated and added to the set of support in Otter, and the number of clauses selected and added to the clause graph in SCG. In SCG, we use a step to denote a resolution step which includes selecting a pair of clauses, resolving them, adding the resolvent into the clause set (or the clause graph in SCG), and updating the set (or the graph in SCG). In Otter the number of steps is measured by the number of clauses generated.. Par/Orth is the parallelity and orthogonality problem presented in section 5, and Problem 35-47 are all examples of full predicate logic without identity and functions from Pelletier's problem set [Pelletier86]. The left most column is the number of steps our set of support strategy prover (SOS) takes to prove the problem. The prover is also based on the same clause graph but, instead of semantic guidance, it makes use of the set of support strategy [WRC65]. The next four columns show the numbers of models in both the original and reduced set and the numbers of steps it takes to prove the problem with the model set. The proof found in step 0 means that it was found during the construction of the initial graph. The number n of models in the reduced sets is taken so that it should be in the range of 40 to 100, which seems to be a good choice. When the prover does not produce the result in a reasonable amount of time (in our experiments 12 hours on a SunE420R with four 450MHz processors), the step is rated as time out.

Although in some examples like Problem 36 and 44 the semantic clause graph prover needs more steps because the problems are relatively easy and the proof may be found in the first several steps by chance with the set of support strategy, in other examples like Problem 39, 40, 41, 42 and 43 the semantic information provides good heuristics for guiding the search. It significantly reduces the number of steps required to search for the proof over the set of support approach.

The right most column is the number of clauses that Otter generates. Please note that our implementation is only a prototype and wants to show the potential of the approach. On the one hand, the current implementation is based on the Lisp extension KEIM [HKKMx94b] and is not a very fast system. Furthermore, semantic guidance is the only search control it has, while Otter employs a variety of sophisticated search control mechanisms. On the other hand, each link in SCG already contains a resolvent rather than a unifier. For this reason, a comparison of the run time behaviour of the systems would not be informative. In order to make a fair comparison with the Otter prover, it would be necessary to integrate the approach directly in Otter.

8.  Conclusion and Future Work

We have described the semantic clause graph prover and presented experimental results. Our approach differs from other semantically guided provers in that it generates models only once at the beginning and uses them during the rest of the procedure, while other semantic approaches like SCOTT [SLM94] generate models not only from the initial clauses but also from new ones. This is an advantage of our approach since in general model generation is computationally much more expensive than checking. Of course, we lose SCOTT's advantage to have a model set which intimately reflects the current state of the proof search.

The performance of semantically guided resolution depends to a great degree on the availability of good models. Our experiments which make use of two-element models show that even there valuable heuristic information can be obtained. To which degree other model classes, like non-isomorphic models, are more promising or not has to be studied for different example classes. Semantic guidance seems to be a standard technique humans use when searching for proofs. Rather than generating models from scratch, they often use typical examples. It remains an open question whether and to which extent this is necessary in our context. It may also be important to use infinite models as investigated by Caferra and Peltier [CP98].

The semantic clause graph prover can be extended by adding a pre-processor of semantic refinements of assumption clauses [Choi02a]. The idea is to use models to guide the search, not as in other approaches that backwardly guide the search for an empty clause between the assumptions and the negated conclusion, but forwardly rewrite the assumptions to a form that is semantically closer to the conclusion. Once the potential of semantic guidance is better understood, the heuristic information can be used alongside other types of heuristic information, which constitutes the strength of up-to-date systems.

References


© 2002, Springer-Verlag. Seungyeob Choi, Manfred Kerber
The URL of this page is http://www.cs.bham.ac.uk/~mmk/papers/02-AAI-clausegraph.html.
Also available as pdf ps.gz bib .