The University of Birmingham

Birmingham B15 2TT, England

http://www.cs.bham.ac.uk/~mmk

The second author likes to thank Norman Foo and the Knowledge Systems Group at the AI Department of the University of New South Wales for their hospitality.

**ABSTRACT**

In this contribution we present a variant of a resolution theorem prover which selects resolution steps based on the proportion of models a newly generated clause satisfies compared to all models given in a reference class. This reference class is generated from a subset of the initial clause set. Since the empty clause does not satisfy any models, preference is given to such clauses which satisfy few models only. Because computing the number of models is computationally expensive on the one hand, but will remain almost unchanged after the application of one single resolution step on the other hand, we adapt Kowalski's connection graph method to store the number of models at each link.

Resolution [Robinson65] is probably the best developed approach
to automated theorem proving, in which the problem is transformed into
a clausal form and a proof is searched for on this clause level. Many
powerful systems have been built on this paradigm, e.g. `MKRP` [EO86], Otter [McCune94], Spass [WGR96], and
Vampire [RV01]. On the one hand, these systems show remarkable
performance on many problems in particular application areas. Recently
a variant of Otter successfully solved the Robbins problem
[McCune97] that had remained as an open problem for several
decades. On the other hand, since these methods depend largely on
blind search, exponential explosion is unavoidable. The key technique
consists of efficiently searching through a big search space and
making good heuristic choices.

Semantic information seems, to a large extent, to be used as a form of
heuristic knowledge in human problem solving. Therefore it is natural
to think that automated theorem provers can also benefit from the use
of semantic information. Some work has been done to integrate semantic
information into resolution theorem proving. Slaney et al.
[SLM94,HS01] developed the `SCOTT`-system which combines the model
generator `FINDER` [Slaney94] and the resolution theorem prover
Otter. To speak in a simplified way, `SCOTT` extends the set of
support strategy by the introduction of a semantic satisfiability
check, which is based on iterated model generation. It could be shown
that this approach makes a significant improvement for some
examples. Plaisted et al. [CP94,PZ00] implemented a theorem
prover using *hyper-linking* techniques [LP92] with the
concept of semantic guidance.

The connection graph proof procedure was first introduced by Kowalski
[Kowalski75]. The main idea is to explicitly represent all
resolution possibilities as links in a graph between literals, where
the links store not only the applicability but additionally the most
general unifier for the application. `MKRP` is a big theorem proving
system which was built in the 1970's and 1980's based on the
connection graph method. For a recent overview on the clause graph
procedure see [SW02]. Independently of open theoretical questions
like completeness investigated by Eisinger [Eisinger91] the
method has not attracted much attention in more recent resolution
theorem provers since the computational overhead to explicitly
represent the unifiers in a graph does not pay off. Other techniques
like indexing are more important for efficiency. In other words the
graph is used to store information which can be more cheaply computed
from first principles. In particular, the heuristic choice on which
link to resolve only marginally makes use of the links.

The approach presented here can be seen as an attempt to use the
connection graph in resolution theorem proving in a more
promising way, namely heuristic guidance is provided by the number of
models of clauses. First a number of models is generated from the
given theory, then a clause graph is constructed out of the set of all
clauses, and finally a proof is searched for by heuristic selection of
the links. The heuristic selects a link that is most promising in the
sense that it has a minimal number of models which fulfil its
offspring. Our prototype implementation is built using `KEIM`
[HKKMx94b]. Note that an adaptation of the semantic guidance
presented in this paper can be applied to clause selection in other
resolution theorem provers as well. We chose the clause graph as it
provides a convenient data structure to store the relevant
information.

The clause graph represents the set of clauses as a graph in which each node corresponds to a literal. The graph consists of one or more groups of literals and each group is a clause. Two complementary unifiable literals are connected by a link which indicates the unification possibility between the literals.

Suppose we have a clause set {NOT P(a), P(x) | NOT Q(x) | R(x), Q(a) | NOT R(y)}, the set has 3 clauses and 6 literals. Since NOT P(a) and P(x) can be resolved by a common unifier, they are linked together.

Since the link resolved upon is deleted, the clause graph now contains
two *pure* literals, R(x) and NOT R(y), which have no links
to others. The clauses containing pure literals can be removed from
the clause graph, since they cannot have the empty clause as their
successor. When a clause is removed, all links connected to the clause
are also removed as no links can exist without nodes. The removal of a
link may produce a further pure literal in the clause connected to the
opposite end, which can also be removed. In fact, an application of a
resolution step can cause a chain reaction of reductions, which in
extreme cases can lead to the collapse of the whole graph, showing that
the original problem was satisfiable. This is the case for the
graph in Figure 2.

The semantics of a set of first-order formulas is represented by a
pair * M* =(

Assume we have a first-order problem given by a formula set Gamma and formula phi for which we want to show that phi follows from Gamma (i.e., Gamma |= phi ). Furthermore we assume that the formula set Gamma is consistent, that is, that it has models. In resolution theorem proving phi is negated and it is proved that Gamma union {NOT phi } is unsatisfiable, that is, has no model.

For our procedure we make the assumption - as in set of support - that the background theory Gamma is consistent. Hence it has a model. Furthermore the method as described below makes use of the assumption that there are finite models for Gamma since we make use of a model generator which generates finite models (If no finite models are found, the method reduces to the standard connection graph method). This is, however, an assumption that can be relaxed if we proceed to a model generator which can generate infinite models. Anyway, we think that the class of problems, for which finite models are sufficient, is rich enough that it deserves investigation.

Slaney's `FINDER` system [Slaney94] is a model generator which
fixes the domain to a specific number of objects and then performs an
exhaustive search in the space of functions and constants which can be
satisfied in that domain. We use `FINDER` as a model generator to
generate a set of models * M* = {

`FINDER` generates models with a fixed cardinality of the domain
* D*. It is possible to give lower and upper bounds for the
cardinality of

Let Gamma be a set of assumptions and phi be the conclusion, such
that Gamma |= phi , and let Gamma ^{ * } and { NOT phi }^{ * } be the sets of clauses
obtained from normalisation of Gamma and { NOT phi },
respectively. The resolution procedure searches for a
contradiction (an empty clause) from Gamma ^{ * } union { NOT phi
}^{ * }. The finite model set * M* = {

In the set of links * L* = { sigma

In summary, our procedure is as follows. If we want to show Gamma
|= phi with phi as theorem from which we generate the set of
support, we first generate a set of models from Gamma ^{ * } with a model
generator, concretely `FINDER`. The next steps are to construct a
clause graph from the set of clauses of Gamma ^{ * } union {NOT phi }^{ * },
applying resolution rules to all connected literals in the graph,
producing resolvents, evaluating these in the set of models and
counting the number of models in which each resolvent is true. We use
the links in the clause graph to store the number of models in order
not to have to recompute them in each
cycle.*Just to store the numbers does not
make use of the full potential of the connection graph method. In a
variant, it would be possible to store the names of the models in each
link. This would improve the performance, since better use can be made
of the inheritance mechanism. Only those models of the resolvent of
the parent link have to be checked and are potentially
inherited. Furthermore, the total number of models could be reduced by
taking into the reference set only pairwise non-isomorphic models.*
The selection of the link on which the next step is to be performed
follows a heuristic which is based on the number of models in which
the resolvent is true.*We have also experimented with a
variant in which the selection makes a probabilistic choice with a
bias towards heuristically promising links, in order to obtain
probabilistic completeness. We have no space to describe this here.*

The procedure in pseudo code is displayed in Figure 3.

INPUT: Gamma * ;;; hypotheses phi * ;;; conclusionM := GenerateModels( Gamma * ); G := ConstructGraph( Gamma * union {NOT phi } * ); FOR EVERY L a link in G let N := number of models of L in M and insert Label(L,N); REPEAT Select(L,G); Res := Resolve(L); Insert(Res,G); FOR EVERY L is a link connected to Res let N := number of models of L in M and insert Label(L,N); Simplify UNTIL empty clause derived OR G empty

An abstract view of the data flow can be found as a flow chart in Figure 4. The following is an abstract description of the whole system. The most important parts are the model generator, the initialising procedure, and the selection procedure. The model generator takes the input theory and generates models of it. The initialising procedure builds the initial clause graph with literal nodes and links; it calculates semantic plausibility for all links and stores them in the links. The selection procedure chooses the links on which to operate next.

As an example, let us look at the problem to show that from "orthogonal is symmetric" follows that "parallel is also symmetric" (P stands for parallel and Q for orthogonal):

FORALL x FORALL y ( Q(x,y) -> Q(y,x) )}

|= FORALL x FORALL y ( P(x,y) -> P(y,x) )

Gamma ^{ * } : |
{NOT P(x,y) | Q(x,f(x,y)), |

NOT P(x,y) | Q(f(x,y),y), | |

P(x,y) | NOT Q(x,z) | NOT Q(z,y), | |

NOT Q(x,y) | Q(y,x) } | |

{NOT phi }^{ * } : |
{ P(a,b), |

NOT P(b,a) } |

If we fix the cardinality to 2,

Models | m_{1} |
m_{2} |
m_{3} |
m_{4} |
m_{5} |
m_{6} |
m_{7} |
m_{8} |
... |
m_{100} |
m_{101} |
m_{102} |
... |
m_{239} |
m_{240} |

P(0,0) | F | F | F | F | F | F | F | F | ... | T | T | T | ... | T | T |

P(0,1) | F | F | F | F | F | F | F | F | ... | F | F | F | ... | T | T |

P(1,0) | F | F | F | F | F | F | F | F | ... | F | F | F | ... | T | T |

P(1,1) | F | F | F | F | F | F | F | F | ... | T | T | T | ... | T | T |

Q(0,0) | F | F | F | F | F | F | F | F | ... | F | F | F | ... | T | T |

Q(0,1) | F | F | F | F | F | F | F | F | ... | T | T | T | ... | T | T |

Q(1,0) | F | F | F | F | F | F | F | F | ... | T | T | T | ... | T | T |

Q(1,1) | F | F | F | F | F | F | F | F | ... | F | F | F | ... | T | T |

f(0,0) | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | ... | 1 | 1 | 1 | ... | 1 | 1 |

f(0,1) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 1 | ... | 1 | 1 |

f(1,0) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | ... | 1 | 1 |

f(1,1) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | ... | 1 | 1 |

a | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | ... | 1 | 0 | 0 | ... | 1 | 1 |

b | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | ... | 1 | 0 | 1 | ... | 0 | 1 |

The number of models generated from Gamma is 240. The procedure is now set up to count for each possible step in the calculus (resolution, factorisation) how many interpretations in

As shown in Figure 7 the application of resolution to Link-1, Link-2 and Link-3 give rise to Resolvent-1, Resolvent-2, and Resolvent-3, respectively. Those resolvents inherit links from their parents. While the number of models of each link needs to be recalculated, this is computationally not too expensive compared to other forms of heuristics. Furthermore as remarked in footnote above the approach can be refined to minimise this by making stronger usage of the inheritance. All the models of an inherited link must be a subset of the model set of the original link of the parent clause, which makes recalculation even easier. In Figure 7, the model numbers of the inherited links are 128, 216 and 224. If there is more than one link with the same minimal number of models, the one that has been created first is selected. Two clauses P(a,b) and NOT P(b,a) now have pure literals and are removed from the graph.

In Figure 7, there are three new resolvents and ten new links. We display only the number of models for the ten newly generated links. Six of the links have 128 models, two have 216, and two have 224. So far the smallest number is still 128, and Link-4 is selected because it was the one created first. When we resolve on Link-4, five new links are created. One of them has only 112 models which is the smallest so far. We select it and apply resolution. The new resolvent also has a link with 112 models and it should be selected again. After selecting it and making another new resolvent, three new links are made and they have 128, 112, and 0 models, respectively. The link of 0 models is between the clauses NOT Q(f(a,b),b) and Q(f(a,b),b) and produces the empty clause as a result of a resolution step.

Although the semantic guidance results in a smaller search space, it may be expensive to maintain the full set of original models and check every link with respect to all models, of which there are often thousands or more, even with the domain cardinality fixed to 2. Hence it is advisable to keep the reference set manageable in order to make the approach practically feasible.

Table 8 shows the number of models and the number of steps for the example taken from section 5. The second column shows the factor of n which indicates that every nth model is taken from the original model class in order to form a reference set.

Models | Factor | Steps |

240 | 1 | 6 |

120 | 2 | 6 |

80 | 3 | 6 |

60 | 4 | Time out |

48 | 5 | 6 |

40 | 6 | 6 |

30 | 8 | Time out |

24 | 10 | 8 |

20 | 12 | 11 |

15 | 16 | Time out |

12 | 20 | Time out |

10 | 24 | Time out |

One interesting feature is that the models do not provide good
heuristics if the factor is 4, 8 or 16. In this particular
example, as we have seen in the table of models, the assumption and
conclusion have two Skolem constants a and b. The interpretations
of these constants are the same in every 4th model if the domain
* D* is fixed to {0, 1}, which means that if we take every
nth model where n is 4i (i integer), both a and b are
always interpreted as 1. In the experiments we avoided
non-informative subsets of the reference models by choosing the factor
to be an odd prime number, such as 7, 11, 13, 17, 19, etc. or by
making a random selection.

So far all model sets used in our examples were generated with the domain cardinality fixed to 2. Typically, a bigger cardinality makes each model more informative, and models of cardinality 2 cannot be considered as representative for all finite models. However, as our experiments show, even the class of two element models often bears useful semantic information from which good heuristics for our purpose can be derived. While it is easy to use models with larger cardinality in our approach and to choose randomly a feasible set from all models, the model generation takes typically much longer for higher cardinalities. In the example above, with a cardinality of 3, the number of models is 939078, and with 4 even 4194304. In the following we fix the domain cardinality for our experiments to 2.

SOS | SCG |
Otter | ||||

Original | Reduced |
|||||

Problem | Steps | Models | Steps | Models | Steps | Steps |

Par/Orth | 82 | 240 | 6 | 40 | 6 | 17 |

Pelletier 35 | 0 | 4096 | 0 | 40 | 0 | 0 |

Pelletier 36 | 4 | 592 | 7 | 53 | 7 | 11 |

Pelletier 37 | 3 | 664320 | timeout | 67 | 3 | 3 |

Pelletier 38 | timeout | 4194304 | timeout | 81 | timeout | timeout |

Pelletier 39* | 5 | 32 | 0 | 32 | 0 | 2 |

Pelletier 40* | timeout | 128 | 10 | 42 | 14 | 48 |

Pelletier 41* | 9 | 18 | 2 | 18 | 2 | 2 |

Pelletier 42* | timeout | 128 | 6 | 42 | 6 | 17 |

Pelletier 43* | timeout | 640 | 101 | 91 | 101 | 138 |

Pelletier 44 | 8 | 2336 | 16 | 75 | 18 | 10 |

Pelletier 45 | timeout | 181760 | timeout | 86 | timeout | 28 |

Pelletier 46 | timeout | 29968 | timeout | 80 | timeout | 52 |

Pelletier 47 | timeout | - | - | - | - | timeout |

Otter: The number of clauses Otter generates.

*: The semantic clause graph takes fewer steps than the set of support.

-: means that the model generation itself took longer than 12 hours.

We have tested our semantic clause graph theorem prover (

Although in some examples like Problem 36 and 44 the semantic clause graph prover needs more steps because the problems are relatively easy and the proof may be found in the first several steps by chance with the set of support strategy, in other examples like Problem 39, 40, 41, 42 and 43 the semantic information provides good heuristics for guiding the search. It significantly reduces the number of steps required to search for the proof over the set of support approach.

The right most column is the number of clauses that Otter
generates. Please note that our implementation is only a prototype and
wants to show the potential of the approach. On the one hand, the
current implementation is based on the Lisp extension `KEIM`
[HKKMx94b] and is not a very fast system. Furthermore, semantic
guidance is the only search control it has, while Otter employs a
variety of sophisticated search control mechanisms. On the other hand,
each link in `SCG` already contains a resolvent rather than a
unifier. For this reason, a comparison of the run time behaviour of
the systems would not be informative. In order to make a fair
comparison with the Otter prover, it would be necessary to
integrate the approach directly in Otter.

We have described the semantic clause graph prover and presented
experimental results. Our approach differs from other semantically
guided provers in that it generates models only once at the beginning
and uses them during the rest of the procedure, while other semantic
approaches like `SCOTT` [SLM94] generate models not only from
the initial clauses but also from new ones. This is an advantage of
our approach since in general model generation is computationally much
more expensive than checking. Of course, we lose `SCOTT`'s advantage
to have a model set which intimately reflects the current state of the
proof search.

The performance of semantically guided resolution depends to a great degree on the availability of good models. Our experiments which make use of two-element models show that even there valuable heuristic information can be obtained. To which degree other model classes, like non-isomorphic models, are more promising or not has to be studied for different example classes. Semantic guidance seems to be a standard technique humans use when searching for proofs. Rather than generating models from scratch, they often use typical examples. It remains an open question whether and to which extent this is necessary in our context. It may also be important to use infinite models as investigated by Caferra and Peltier [CP98].

The semantic clause graph prover can be extended by adding a pre-processor of semantic refinements of assumption clauses [Choi02a]. The idea is to use models to guide the search, not as in other approaches that backwardly guide the search for an empty clause between the assumptions and the negated conclusion, but forwardly rewrite the assumptions to a form that is semantically closer to the conclusion. Once the potential of semantic guidance is better understood, the heuristic information can be used alongside other types of heuristic information, which constitutes the strength of up-to-date systems.

- [Cade-94]
A. Bundy, editor.
Proceedings of the 12th CADE, LNAI 814. Springer-Verlag, 1994.
- [CP98]
R. Caferra and N. Peltier.
Disinference rules, model building and abduction.
In E. Or\lowska, editor, Logic at Work: Essays Dedicated to
the Memory of Helena Rasiowa, chapter 20, pages 331-353. Physica-Verlag,
1999.
- [Choi02a]
S. Choi.
Towards semantic goal-directed forward reasoning in resolution.
In D. Scott, editor, Proceedings of the AIMSA'2002, LNAI 2443, pages 243-252, 2002.
- [CP94]
H. Chu and D.A. Plaisted.
Semantically guided first-order theorem proving using hyper-linking.
In Bundy [Cade-94], pages 192-206.
- [Eisinger91]
N. Eisinger.
Completeness, Confluence, and Related Properties of Clause Graph
Resolution.
Research Notes in Artificial Intelligence. Pitman, London, 1991.
- [EO86]
N. Eisinger and H.J. Ohlbach.
The Markgraf Karl Refutation Procedure (MKRP).
In J. Siekmann, editor, Proceedings of the 8th CADE,
LNAI 230, pages 681-682. Springer-Verlag, 1986.
- [HS01]
K. Hodgson and J. Slaney.
Development of a semantically guided theorem prover.
In Goré et al. Proceedings of the IJCAR 2001, LNAI 2083, pages 443-447. Springer-Verlag, 2001.
- [HKKMx94b]
X. Huang, M. Kerber, M. Kohlhase, E. Melis, D. Nesmith, J. Richts, and J. Siekmann.
KEIM: A toolkit for automated deduction.
In Bundy [Cade-94], pages 807-810.
- [Kowalski75]
R. Kowalski.
A proof procedure using connection graphs.
Journal of the Association for Computing Machinery,
22(4):572-595, 1975.
- [LP92]
S.J. Lee and D.A. Plaisted.
Eliminating duplication with the hyper-linking strategy.
Journal of Automated Reasoning, 9(1):25-42, 1992.
- [McCune94]
W. McCune.
OTTER 3.0 Reference Manual and Guide.
Mathematics and Computer Science Division, Argonne National
Laboratory, Argonne, Illinois, USA, 1994.
- [McCune97]
W. McCune.
Solution of the Robbins problem.
Journal of Automated Reasoning, 19:263-276, 1997.
- [Pelletier86]
F.J. Pelletier.
Seventy-five problems for testing automatic theorem provers.
Journal of Automated Reasoning, 2(2):191-216, 1986.
- [PZ00]
D.A. Plaisted and Y. Zhu.
Ordered semantic hyper-linking.
Journal of Automated Reasoning, 25(3):167-217, 2000.
- [RV01]
A. Riazanov and A. Voronkov.
Vampire 1.1.
In Goré et al. Proceedings of the IJCAR 2001, LNAI 2083, pages 376-380. Springer-Verlag, 2001.
- [Robinson65]
J.A. Robinson.
A machine-oriented logic based on the resolution principle.
Journal of the Association for Computing Machinery,
12(1):23-41, 1965.
- [SW02]
J. Siekmann and G. Wrightson.
An open research problem: Strong completeness of R. Kowalski's
connection graph proof procedure.
Logic Journal of the IGPL, 10(1):85-103, 2002.
- [Slaney94]
J. Slaney.
FINDER: Finite Domain Enumerator.
In Bundy [Cade-94], pages 798-801.
- [SLM94]
J. Slaney, E. Lusk, and W. McCune.
SCOTT: Semantically Constrained Otter.
In Bundy [Cade-94], pages 764-768.
- [WGR96]
C. Weidenbach, B. Gaede, and G. Rock.
SPASS & FLOTTER, Version 0.42.
In M. A. McRobbie and J. K. Slaney, editors, Proceedings of the
13th CADE, LNAI 1104, pages 141-145. Springer-Verlag, 1996.
- [WRC65]
L. Wos, G.A. Robinson, and D.F. Carson.
Efficiency and completeness of the set of support strategy in theorem
proving.
Journal of the Association for Computing Machinery,
12(4):536-541, 1965.

© 2002, Springer-Verlag. Seungyeob Choi, Manfred Kerber

The URL of this page is http://www.cs.bham.ac.uk/~mmk/papers/02-AAI-clausegraph.html.

Also available as pdf ps.gz bib .