This work was supported by EPSRC grant GR/M22031 and DFG-SFB 378 (first author), EU Marie-Curie-Fellowship HPMF-CT-2002-01701 (second author), and EPSRC Advanced Research Fellowship GR/R76783 (third author).

66041 Saarbrücken, Germany (http://www.ags.uni-sb.de/~chris)

Birmingham B15 2TT, England, UK (http://www.cs.bham.ac.uk/~vxs, http://www.cs.bham.ac.uk/~mmk)

Cambridge CB3 0FD, England, UK (http://www.cl.cam.ac.uk/~mj201)

**ABSTRACT**

State-of-the-art first-order automated theorem proving systems have reached considerable strength over recent years. However, in many areas of mathematics they are still a long way from reliably proving theorems that would be considered relatively simple by humans. For example, when reasoning about sets, relations, or functions, first-order systems still exhibit serious weaknesses. While it has been shown in the past that higher-order reasoning systems can solve problems of this kind automatically, the complexity inherent in their calculi and their inefficiency in dealing with large numbers of clauses prevent these systems from solving a whole range of problems. We present a solution to this challenge by combining a higher-order and a first-order automated theorem prover, both based on the resolution principle, in a flexible and distributed environment. By this we can exploit concise problem formulations without forgoing efficient reasoning on first-order subproblems. We demonstrate the effectiveness of our approach on a set of problems still considered non-trivial for many first-order theorem provers.

When dealing with problems containing higher-order concepts, such as sets, functions, or relations, today's state-of-the-art first-order automated theorem provers (ATPs) still exhibit weaknesses on problems considered relatively simple by humans (cf. [GanzingerStuber-03-cade]). One reason is that the problem formulations use an encoding in a first-order set theory, which makes it particularly challenging when trying to prove theorems from first principles, that is, basic axioms. Therefore, to aid ATPs in finding proofs, problems are often enriched by hand-picked additional lemmata, or axioms of the selected set theory are dropped leaving the theory incomplete. This has recently motivated extensions of state-of-the-art first-order calculi and systems, as for example presented in [GanzingerStuber-03-cade] for the Saturate system. The extended Saturate system can solve some problems from the SET domain in the TPTP [SuSu98] which Vampire [vampire01] and E- Setheo's [SW:TABLEAUX-2000] cannot solve.

While it has already been shown in [C3,T2] that many problems of this nature can be easily proved from first principles using a concise higher-order representation and the higher-order resolution ATP Leo, the combinatorial explosion inherent in Leo's calculus prevents the prover from solving a whole range of possible problems with one universal strategy. Often higher-order problems require only relatively few but essential steps of higher-order reasoning, while the overwhelming part of the reasoning is first-order or even propositional level. This suggests that Leo's performance could be improved when combining it with a first-order ATP to search efficiently for a possible refutation in the subset of those clauses that are essentially first-order.

The advantages of such a combination -- further discussed in
Sec. 2 -- are not only that many problems can still be
efficiently shown from first principles in a general purpose approach, but
also that problems can be expressed in a very concise way. For instance, we
present 45 problems from the SET domain of the TPTP-v3.0.1,
together with their entire formalisation in less than two pages in this
paper, which is difficult to achieve within a framework that does not provide
`lambda`-abstraction. We use this problem set, which is an extension of
the problems considered in [GanzingerStuber-03-cade], in
Sec. 4 to show the effectiveness of our approach. While many of
the considered problems can be proved by Leo alone with some strategy,
the combination of Leo with the first-order ATP
Bliksem [Nivelle99] is not only able to show more problems, but also
needs only a single strategy to solve them. Several of our problems are
considered very challenging by the first-order community and five of them
(of which Leo can solve four) have a TPTP rating of 1.00, saying that
they cannot be solved by any TPTP prover to date.

Technically, the combination -- described in more detail in Sec. 3 -- has been realised in the concurrent reasoning system Oants [Sorge01phd,BeSo01] which enables the cooperation of hybrid reasoning systems to construct a common proof object. In our past experiments, Oants has been successfully employed to check the validity of set equations using higher-order and first-order ATPs, model generation, and computer algebra [BeJaKeSo01b]. While this already enabled a cooperation between Leo and a first-order ATP, the proposed solution could not be classified as a general purpose approach. A major shortcoming was that all communication of partial results had to be conducted via the common proof object, which was very inefficient for hard examples. Thus, the solved examples from set theory were considered too trivial, albeit they were often similar to those still considered challenging in the TPTP in the first-order context. In this paper we now present a novel approach to the cooperation between Leo and Bliksem inside Oants by decentralising communication. This leads not only to a higher overall efficiency -- Sec. 4 details our results -- but also to a general purpose approach based on a single strategy in Leo.

Existing higher-order ATPs generally exhibit deficits in efficiently reasoning with first-order problems for several reasons. Unlike in the case of first-order provers, for which sophisticated calculi and strategies, as well as advanced implementation techniques, such as term indexing [NHRV01], have been developed, fully mechanisable higher-order calculi are still at a comparably early stage of development. Some problems are much harder in higher-order, for instance, unification is undecidable, strong constraining term- and literal-orderings are not available, extensionality reasoning and set variable instantiation has to be addressed. Nevertheless, for some mathematical problem domains, such as naive set theory, for instance, automated higher-order reasoning performs very well.

We motivate the need for linking higher-order and first-order ATPs with some examples from Table 1.

-------- | ------------- |

Example[SET] |
Problem Formalisation |

Example[014+4] |
FORALL X_{o alpha },Y_{o alpha }, A_{o alpha } . [[(X subseteq A) & (Y subseteq A)] -> ((X union Y) subseteq A)] |

Example[017+1] |
FORALL x_{ alpha }, y_{ alpha }, z_{ alpha } .
[UnOrderedPair(x,y) = UnOrderedPair(x,z)
-> y = z] |

Example[066+1] |
FORALL x_{ alpha }, y_{ alpha } . [UnOrderedPair(x,y) = UnOrderedPair(y,x) |

Example[067+1] |
FORALL x_{ alpha} , y_{ alpha} . [(UnOrderedPair(x,x) subseteq UnOrderedPair(x,y))] |

Example[076+1] |
FORALL x_{ alpha} , y_{ alpha} . FORALL
Z_{o alpha }. x in Z & y in Z-> (UnOrderedPair(x,y) subseteq Z) |

Example[086+1] |
FORALL x_{ alpha} . EXISTS y_{ alpha} . [y in Singleton(x)] |

Example[096+1] |
FORALL X_{o alpha }, y_{ alpha} . [(X subseteq Singleton(y)) -> [X = EMPTYSET | X = Singleton(y)]] |

Example[143+3] |
FORALL X_{o alpha }, Y_{o alpha }, Z_{o alpha } . [((X intersect Y) intersect Z) = (X intersect (Y intersect Z))] |

Example[171+3] |
FORALL X_{o alpha }, Y_{o alpha }, Z_{o alpha } . [(X union (Y intersect Z)) = ((X union Y) intersect (X union Z))] |

Example[580+3] |
FORALL X_{o alpha }, Y_{o alpha }, u_{ alpha} . [u in ExclUnion(X,Y) <-> [u in X <-> u NOT in Y]] |

Example[601+3] |
FORALL. X_{o alpha }, Y_{o alpha }, Z_{o alpha } [((X intersect Y) union ((Y intersect Z) union (Z intersect X))) = ((X union Y) intersect ((Y union Z) intersect (Z union X)))] |

Example[606+3] |
FORALL X_{o alpha }, Y_{o alpha } . [(X WITHOUT (X intersect Y)) = (X WITHOUT Y)] |

Example[607+3] |
FORALL X_{o alpha }, Y_{o alpha } . [(X union (Y WITHOUT X)) = (X union Y)] |

Example[609+3] |
FORALL X_{o alpha }, Y_{o alpha }, Z_{o alpha } . [(X WITHOUT (Y WITHOUT Z)) = ((X WITHOUT Y) union (X intersect Z))] |

Example[611+3] |
FORALL X_{o alpha }, Y_{o alpha } . [(X intersect Y) = EMPTYSET <-> (X WITHOUT Y) = X] |

Example[612+3] |
FORALL X_{o alpha }, Y_{o alpha }, Z_{o alpha } . [(X WITHOUT (Y union Z)) = ((X WITHOUT Y) intersect (X WITHOUT Z))] |

Example[614+3] |
FORALL X_{o alpha }, Y_{o alpha }, Z_{o alpha } . [((X WITHOUT Y) WITHOUT Z) = (X WITHOUT (Y union Z))] |

Example[615+3] |
FORALL X_{o alpha }, Y_{o alpha }, Z_{o alpha }
. [((X union Y) WITHOUT Z) =
((X WITHOUT Z) union (Y WITHOUT Z))] |

Example[623+3] |
FORALL X_{o alpha }, Y_{o alpha }, Z_{o alpha } .
[ExclUnion(ExclUnion(X,Y),Z) = ExclUnion(X,ExclUnion(Y,Z))] |

Example[624+3] |
FORALL X_{o alpha }, Y_{o alpha }, Z_{o alpha } . [Meets(X,(Y union Z)) <-> [Meets(X,Y) | Meets(X,Z)]] |

Example[630+3] |
FORALL X_{o alpha }, Y_{o alpha } . [Misses((X intersect Y),ExclUnion(X,Y))] |

Example[640+3] |
FORALL R_{o beta alpha }, Q_{o beta alpha } . [Subrel(R,Q) -> Subrel(R,((lambda u_{ alpha} .T)*(lambda v_{ beta} . T)))] |

Example[646+3] |
FORALL x_{ alpha} , y_{ beta} . [Subrel(Pair(x,y),((lambda u_{ alpha} .T)*(lambda v_{ beta} .T)))] |

Example[647+3] |
FORALL R_{o beta alpha }, X_{o alpha } . [(RDom(R) subseteq X) -> Subrel(R,(X*RCoDom(R)))] |

Example[648+3] |
FORALL R_{o beta alpha }, Y_{o beta } . [(RCoDom(R) subseteq Y) -> Subrel(R,(RDom(R)*Y))] |

Example[649+3] |
FORALL R_{o beta alpha }, X_{o alpha }, Y_{o beta } . [[(RDom(R) subseteq X) & (RCoDom(R) subseteq Y)] -> Subrel(R,(X*Y))] |

Example[651+3] |
FORALL R_{o beta alpha } . [(RDom(R) subseteq A_{o alpha }) -> Subrel(R,(A*(lambda u_{ beta} .T)))] |

Example[657+3] |
FORALL R_{o beta alpha } . [(Field(R) subseteq ((lambda u_{ alpha} .T) union (lambda v_{ beta} .T)))] |

Example[669+3] |
FORALL R_{o alpha alpha } . [Subrel(Id(lambda u_{ alpha} .T),R) -> [((lambda u_{ alpha} .T) subseteq RDom(R)) & (lambda u_{ alpha} .T) = RCoDom(R)]] |

Example[670+3] |
FORALL Z_{o alpha }, R_{o beta alpha }, X_{o alpha }
Y_{o beta } . [IsRelOn(R,X,Y)
->
IsRelOn(RestrictRDom(R,Z),Z,Y) ] |

Example[671+3] |
FORALL Z_{o alpha }, R_{o beta alpha },
X_{o alpha }, Y_{o beta } .
[[IsRelOn(R,X,Y) &
(X subseteq Z)] ->
RestrictRDom(R,Z) = R] |

Example[672+3] |
FORALL Z_{o beta }, R_{o beta alpha }, X_{o alpha }
Y_{o beta } . [IsRelOn(R,X,Y)
->
IsRelOn(Restric(tRCodom(R,Z)),X,Z) ] |

Example[673+3] |
FORALL Z_{o beta }, R_{o beta alpha }, X_{o alpha }, Y_{o beta } . [[IsRelOn(R,X,Y) & (Y subseteq Z)] -> Restric(tRCodom(R,Z)) = R] |

Example[680+3] |
FORALL R_{o beta alpha }, X_{o alpha }, Y_{o beta } . [IsRelOn(R,X,Y) ->[FORALL u _{ alpha} . u in X -> [u in RDom(R) <-> EXISTS v_{ beta} . v in Y & R(u,v)]]] |

Example[683+3] |
FORALL R_{o beta alpha }, X_{o alpha }, Y_{o beta } . [IsRelOn(R,X,Y) -> [FORALL v _{ beta} . v in Y -> [v in RCoDom(R) ->EXISTS u_{ alpha} . u in X & u in RDom(R)]]] |

Example[684+3] |
FORALL P_{o beta alpha }, R_{o gamma beta }, x_{ alpha }, z_{ gamma } . [RelComp(P,R) x z <-> EXISTS y_{ beta} . P x y & R y z] |

Example[686+3] |
FORALL Z_{o alpha }, R_{o gamma beta }, x_{ alpha } . [x in InverseImageR(R,Z) <-> EXISTS y_{ alpha} . R x y & x in Z] |

Example[716+4] |
FORALL F_{ beta alpha }, G_{ gamma beta } . [[Inj(F) & Inj(G)] -> Inj((F o G))] |

Example[724+4] |
FORALL F_{ beta alpha }, G_{ gamma beta },
H_{ gamma beta } . [[(G o F) =
(H o F) &
Surj(F)] -> G =
H] |

Example[741+4] |
FORALL F_{ beta alpha }, G_{ gamma beta },
H_{ alpha gamma } .
[[Inj((F o G) o H)
&
Surj((G o H) o F)
&
Surj((H o F) o G)]
-> Bij(H)] |

Example[747+4] |
FORALL F_{ beta alpha }, G_{ gamma beta }, <= ^{1}_{o alpha alpha }, <= ^{2}_{o beta beta }, <= ^{3}_{o gamma gamma } . [[IncreasingF(F, <= ^{1}, <= ^{2}) & DecreasingF(G, <= ^{2}, <= ^{3})] -> DecreasingF((G o F), <= ^{1}, <= ^{3})] |

Example[752+4] |
FORALL X_{o alpha }, Y_{o alpha }, F_{ beta alpha } . [InverseImageF(F,(X union Y)) = (InverseImageF(F,X) union InverseImageF(F,Y))] |

Example[753+4] |
FORALL X_{o alpha }, Y_{o alpha }, F_{ beta alpha } . [(InverseImageF(F,(X intersect Y)) subseteq (InverseImageF(F,X) intersect InverseImageF(F,Y)))] |

Example[764+4] |
FORALL F_{ beta alpha } . [InverseImageF(F, EMPTYSET ) = EMPTYSET ] |

Example[770+4] |
FORALL R_{o beta alpha }, Q_{o beta alpha } . [[EquivRel(R) & EquivRel(Q)] ->[ EquivClasses(R) = EquivClasses(Q) | Disjoint(EquivClasses(R),EquivClasses(Q))]] |

-------- | ------------- |

It contains a range of challenging problems taken from the TPTP,
against which we will evaluate our system in Sec. 4. The
problems are given by the identifiers used in the SET domain of the
TPTP, and are formalised in a variant of Church's simply typed
`lambda`-calculus with prefix polymorphism. In classical type theory
terms and all their sub-terms are typed. Polymorphism allows the
introduction of type variables such that statements can be made for
all types. For instance, in problem SET014+4 the universally
quantified variable X_{o alpha } denotes a mapping from objects of
type alpha to objects of type o. We use Church's notation
o alpha , which stands for the functional type alpha -> o.
The reader is referred to [Andrews02] for a more detailed
introduction. In the remainder, o will denote the type of truth
values, and small Greek letters will denote arbitrary types. Thus,
X_{o alpha } (resp. its a-longform `lambda` y_{ alpha} . Xy)
is actually a characteristic function denoting the set of elements of
type alpha , for which the predicate associated with X holds. As
further notational convention, we use capital letter variables to
denote sets, functions, or relations, while lower case letters denote
individuals. Types are usually only given in the first occurrence of
a variable and omitted if inferable from the context.

The problems in Table 1 employ defined concepts that are specified in a knowledge base of hierarchical theories that Leo has access to. All concepts necessary for defining our problems in Table 1 are given in Table 2.

Defined Notions in Theory Typed Set | |

_ in _ |
lambda x_{ alpha} , A_{o alpha }
. [A x] |

EMPTYSET |
[lambda x_{ alpha} . F] |

(_ subseteq _) |
lambda A_{o alpha }, B_{o alpha }
. [FORALL x_{ alpha} . x in A -> x in B] |

(_ union _) |
lambda A_{o alpha }, B_{o alpha } . [lambda x_{ alpha} . x in A v x in B] |

(_ intersect _) |
lambda A_{o alpha }, B_{o alpha } . [lambda x_{ alpha} . x in A & x in B] |

Complement(_) |
lambda A_{o alpha } . [lambda x_{ alpha} . x NOT in A] |

(_ WITHOUT _) |
lambda A_{o alpha }, B_{o alpha } .
[lambda x_{ alpha} . x in A & x NOT in B] |

ExclUnion(_,_) |
lambda A_{o alpha }, B_{o alpha } . [(A WITHOUT B) union (B WITHOUT A)] |

Disjoint(_,_) |
lambda A_{o alpha }, B_{o alpha } . [A union B = {}] |

Meets(_,_) |
lambda A_{o alpha }, B_{o alpha }
. [EXISTS x_{ alpha} . x in A & x in B] |

Misses(_,_) |
lambda A_{o alpha }, B_{o alpha }
. [NOT EXISTS x_{ alpha} . x in A & x in B] |

Defined Notions in Theory Relation | |

UnOrderedPair(_,_) |
lambda x_{ alpha} , y_{ alpha} . [lambda u_{ alpha} . u = x v u = y] |

Singleton(_) |
lambda x_{ alpha} . [lambda u_{ alpha} . u = x] |

Pair(_,_) |
lambda x_{ alpha} , y_{ beta} . [lambda u_{ alpha} , v_{ beta} . u = x & v = y] |

(_*_) | lambda A_{o alpha }, B_{o beta } . [lambda u_{ alpha} , v_{ beta} . u in A & v in B] |

RDom(_) |
lambda R_{o beta alpha } . [lambda x_{ alpha} . EXISTS
y_{ beta} . R x y] |

RCoDom(_) |
lambda R_{o beta alpha } . [lambda y_{ beta} . EXISTS
x_{ alpha} . R x y] |

Subrel(_,_) |
lambda R_{o beta alpha }, Q_{o beta alpha } . [FORALL x_{ alpha} . FORALL y_{ alpha} . R x y -> Q x y] |

Id(_) |
lambda A_{o alpha }. [lambda x_{ alpha} , y_{ alpha} . x in A & x = y] |

Field(_) |
lambda R_{o beta alpha } . [ RDom(B) union RCoDom(R)] |

IsRelOn(_,_,_) |
lambda R_{o beta alpha }, A_{o alpha } . lambda B_{o beta } . [FORALL x_{ alpha} , y_{ beta} . R x y -> (x in A & x in B)] |

Restric(tRCodom(_,_)) |
lambda R_{o beta alpha }, A_{o alpha } . [lambda x_{ alpha} , y_{ beta} . x in A & R x y] |

RelComp(_,_) |
lambda R_{o beta alpha }, Q_{o gamma beta } . [lambda x_{ alpha} , z_{ gamma} . EXISTS y_{ beta} . R x y & R y z] |

InverseImageR(_,_) |
lambda R_{o beta alpha }, B_{o beta } . [lambda x_{ alpha} . EXISTS y_{ beta} . y in B & R x y] |

Reflexive(_) |
lambda R_{o beta alpha } . [FORALL x_{ alpha} . R x x] |

Symmetric(_) |
lambda R_{o beta alpha } . [FORALL x_{ alpha} . FORALL y_{ alpha} . R x y -> R y x] |

Transitive(_) |
lambda R_{o beta alpha } . [FORALL x_{ alpha} . FORALL y_{ alpha} . FORALL z_{ alpha} . R x y & R y z -> R x z] |

EquivRel(_) |
lambda R_{o beta alpha } . [Reflexive(R) & Symmetric(R) & Transitive(R)] |

EquivClasses(_) |
lambda R_{o alpha alpha } . [lambda
A_{o alpha } . EXISTS u_{ alpha} . u in A & FORALL v_{ alpha} . v in A <-> R u v] |

Defined Notions in Theory Function | |

Inj(_) |
lambda F_{ beta alpha } . [FORALL x_{ alpha} , y_{ beta} . F(x) = F(y) -> x = y] |

Surj(_) |
lambda
F_{ beta alpha } . [FORALL y_{ beta} . EXISTS x_{ alpha} . y =
F(x)] |

Bij(_) |
lambda F_{ beta alpha } .
Surj(F) & Inj(F) |

InverseImageF(_,_) |
lambda F_{ beta alpha }, A_{o alpha } . [lambda y_{ beta} . EXISTS x_{ alpha} . x in A & y = F(x)] |

InverseImageF(_,_) |
lambda F_{ beta alpha }, B_{o beta } . [lambda x_{ alpha} . EXISTS y_{ beta} . y in B & y = F(x)] |

(_ o _) | lambda F_{ beta alpha }, G_{ gamma beta } . [lambda x_{ alpha} . G(F(x))] |

IncreasingF(_,_,_) |
lambda F_{ beta alpha }, <= ^{1}_{o alpha alpha }, <= ^{2}_{o beta beta } . [FORALL x_{ alpha} , y_{ alpha} . x <= ^{1} y -> F(x) <= ^{2} F(y) ] |

DecreasingF(_,_,_) |
lambda F_{ beta alpha }, <= ^{1}_{o alpha alpha }, <= ^{2}_{o beta beta } . [FORALL x_{ alpha} , y_{ alpha} . x <= ^{1} y -> F(y) <= ^{2} F(x)] |

-------- | ------------- |

Concepts are defined in terms of `lambda`-expressions and they may
contain other, already specified concepts. For presentation purposes, we
use customary mathematical symbols union , `union` , etc., for some concepts
like union, intersection, etc., and we also use infix notation. For
instance, the definition of union on sets can be easily read in its more
common mathematical representation A union B := {x | x `in` A `v` x `in`
B}. Before proving a problem, Leo always expands -- recursively, if
necessary -- all occurring concepts. This straightforward expansion to
first principles is realised by an automated preprocess in our current
approach.

**SET171+3** We first discuss example SET171+3 to contrast our
formalisation to a standard first-order one. After recursively
expanding the input problem, that is, completely reducing it to first
principles, Leo turns it into a negated unit clause. Since this
initial clause is not in normal form, Leo first normalises it with
explicit clause normalisation rules to reach some proper initial
clauses. In our concrete case, this normalisation process leads to the
following unit clause consisting of a (syntactically not solvable)
unification constraint (here
**B _{o alpha },C_{o alpha },D_{o alpha }** are Skolem constants and

Note that negated primitive equations are generally automatically converted
by Leo into unification constraints. This is why [(`lambda` x_{ alpha}
. Bx `v` (Cx `&` Dx)) =^{?} (`lambda` x_{ alpha} . (Bx `v` Cx)
`&` (Bx `v` Dx))] is generated, and not [(`lambda` x_{ alpha} . Bx
`v` (Cx `&` Dx)) = (`lambda` x_{ alpha} . (Bx `v` Cx) `&` (Bx `v`
Dx))]^{F}. Observe, that we write [.]^{T} and [.]^{F} for positive and
negative literals, respectively. Leo then applies its goal directed
functional and Boolean extensionality rules which replace this unification
constraint by the negative literal (where x is a Skolem constant):

Let us consider now this same example SET171+3 in its first-order formulation from the TPTP (see Table 3).

Assumptions: | (1) | FORALL B, C, x. [x in (B union C) <-> x in B v x in C] |

(2) | FORALL B, C, x. [x in (B union C) <-> x in B & x in C] | |

(3) | FORALL B, C . [B = C <-> B subset C & C subset B] | |

(4) | FORALL B, C . [B union C = C union B] | |

(5) | FORALL B, C . [B union C = C union B] | |

(6) | FORALL B, C . [B subset C <-> FORALL x . x in B -> x in C] | |

(7) | FORALL B, C . [B = C <-> FORALL x . x in B <-> x in C] | |

Proof Goal: | (8) | FORALL B, C, D . [B union (C union D) = (B union C) union (B union D)] |

We can observe that the assumptions provide only a partial axiomatisation of naive set theory. On the other hand, the specification introduces lemmata that are useful for solving the problem. In particular, assumption (7) is trivially derivable from (3) with (6). Obviously, clausal normalisation of this first-order problem description yields a much larger and more difficult set of clauses. Furthermore, definitions of concepts are not directly expanded as in Leo. It is therefore not surprising that most first-order ATPs still fail to prove this problem. In fact, very few TPTP provers were successful in proving SET171+3. Amongst them are Muscadet 2.4. [Pastre01], Vampire 7.0, and Saturate. The natural deduction system Muscadet uses special inference rules for sets and needs 0.2 seconds to prove this problem. Vampire needs 108 seconds. The Saturate system [GanzingerStuber-03-cade] (which extends Vampire with Boolean extensionality rules that are a one-to-one correspondence to Leo's rules for Extensional Higher-Order Paramodulation [C5]) can solve the problem in 2.9 seconds while generating 159 clauses. The significance of such comparisons is clearly limited since different systems are optimised to a different degree. One noted difference between the experiments with first-order provers listed above, and the experiments with Leo and Leo-Bliksem is that first-order systems often use a case tailored problem representation (e.g., by avoiding some base axioms of the addressed theory), while Leo and Leo-Bliksem have a harder task of dealing with a general (not specifically tailored) representation.

For the experiments with Leo and the cooperation of Leo with the
first-order theorem prover Bliksem,
`lambda`-abstraction as well as the extensionality treatment inherent
in Leo's calculus [J5] is used. This enables a theoretically*For
pragmatic reasons, such as efficiency, most of Leo's tactics are
incomplete. Leo's philosophy is to rely on a theoretically complete
calculus, but to practically provide a set of complimentary
strategies so that these cover a broad range of theorems.*
Henkin-complete proof system for set theory. In the above example
SET171+3, Leo generally uses the application of functional
extensionality to push extensional unification constraints down to
base type level, and then eventually applies Boolean extensionality to
generate clauses from them. These are typically much simpler and often
even propositional-like or first-order-like (FO-like,
for short), that is, they do not contain any `real' higher-order
subterms (such as a `lambda`-abstraction or embedded equations), and
are therefore suitable for treatment by a first-order ATP or even a
propositional logic decision procedure.

**SET624+3** Sometimes, extensionality treatment is not required
and the originally higher-order problem is immediately reduced to only FO-like clauses.
For example, after expanding the definitions, problem SET624+3 yields the following clause
(where B_{o alpha },C_{o alpha },D_{o alpha } are again Skolem constants):
]^{F}
Normalisation results in 26 FO-like clauses, which present a hard
problem for Leo: it needs approx. 35 seconds (see
Sec. 4) to find a refutation, whereas first-order ATPs only need a
fraction of a second.

**SET646+3** Sometimes, problems are immediately refuted
after the initial clause normalisation. For example, after definition expansion
in problem SET646+3 we get the following clause (where
B_{o alpha },C_{o alpha },x_{ alpha } are again Skolem constants):

**SET611+3** The examples discussed so far all
essentially apply extensionality treatment and normalisation to the
input problem in order to immediately generate a set of inconsistent
FO-like clauses. Problem SET611+3 is more complicated as it requires
several reasoning steps in Leo before the initially consistent
set of available FO-like clauses grows into an inconsistent one.
After definition expansion, Leo is first given the input clause:

[(

In summary, each of the examples discussed in this section exposes a motivation for our higher-order/first-order cooperative approach to theorem proving. In particular, they show that:

- Higher-order formulations allow for a concise problem representation which often allows easier and faster proof search than first-order formulations.
- Higher-order problems can often be reduced to a set of first-order clauses that can be more efficiently handled by a first-order ATP.
- Some problems are trivially refutable after clause normalisation.
- Some problems require in-depth higher-order reasoning before a refutable first-order clause set can be extracted.

The cooperation between higher-oder and first-order reasoners, which we investigate in this paper, is realised in the concurrent hierarchical blackboard architecture Oants [BeSo98b]. We first describe in Sec. 3.1 the existing Oants architecture. In order to overcome some of its problems, in particular efficiency problems, we devised within Oants a new and improved cooperation method for the higher-order ATP Leo and first-order provers (in particular, Bliksem) - we describe this in Sec. 3.2. We address the question of how to generate the necessary clauses in Sec. 3.3, and discuss soundness and completeness of our implementation of the higher-order/first-order cooperation in Sec. 3.4.

Oants was originally conceived to support interactive theorem proving but was later extended to a fully automated proving system [Sorge01phd,BeSo01]. Its basic idea is to compose a central proof object by generating, in each proof situation, a ranked list of potentially applicable inference steps. In this process, all inference rules, such as calculus rules or tactics, are uniformly viewed with respect to three sets: premises, conclusions, and additional parameters. The elements of these three sets are called arguments of the inference rule and they usually depend on each other. An inference rule is applicable if at least some of its arguments can be instantiated with respect to the given proof context. The task of the Oants architecture is now to determine the applicability of inference rules by computing instantiations for their arguments.

The architecture consists of two layers. On the lower layer, possible instantiations of the arguments of individual inference rules are computed. In particular, each inference rule is associated with its own blackboard and concurrent processes, one for each argument of the inference rule. The role of every process is to compute possible instantiations for its designated argument of the inference rule, and to record these on the blackboard. The computations are carried out with respect to the given proof context and by exploiting information already present on the blackboard, that is, argument instantiations computed by other processes. On the upper layer, the information from the lower layer is used for computing and heuristically ranking the inference rules that are applicable in the current proof state. The most promising rule is then applied to the central proof object and the data on the blackboards is cleared for the next round of computations.

Oants employs resource reasoning to guide search.*Oants
provides facilities to define and modify the processes at run-time.
But notice that we do not use these advanced features in the case
study presented in this paper.* This enables the controlled
integration (e.g., by specifying time-outs) of full-fledged external
reasoning systems such as automated theorem provers, computer algebra
systems, or model generators into the architecture. The use of the
external systems is modelled by inference rules, usually one for each
system. Their corresponding computations are encapsulated in one of
the independent processes in the architecture. For example, an
inference rule modelling the application of an ATP has its conclusion
argument set to be an open goal. A process can then place an open goal
on the blackboard, where it is picked up by a process that applies the
prover to it. Any computed proof or partial-proof from the external
system is again written to the blackboard from where it is
subsequently inserted into the proof object when the inference rule is
applied. While this setup enables proof construction by a
collaborative effort of diverse reasoning systems, the cooperation can
only be achieved via the central proof object. This means that all
partial results have to be translated back and forth between the
syntaxes of the integrated systems and the language of the proof
object. Since there are many types of integrated systems, the
language of the proof object -- a higher-order language even richer
than Leo's, together with a natural deduction calculus -- is
expressive but also cumbersome. This leads not only to a large
communication overhead, but also means that complex proof objects have
to be created (large clause sets need to be transformed into large
single formulae to represent them in the proof object; the support for
this in Oants to date is inefficient), even if the reasoning of all
systems involved is clause-based. Consequently, the cooperation
between external systems is typically rather
inefficient [BeJaKeSo01b].

In order to overcome the problem of the communication bottleneck described above, we devised a new method for the cooperation between a higher-order and a first-order theorem prover within Oants. Rather than modelling each theorem prover as a separate inference rule (and hence needing to translate the communication via the language of the central proof object), we model the cooperation between a higher-order (concretely, Leo) and a first-order theorem prover (in our case study Bliksem) in Oants as a single inference rule. The cooperation between these two theorem provers is carried out directly and not via the central proof object. This avoids translating clause sets into single formulae and back. While in our previous approach the cooperation between Leo and an FO-ATP was modelled at the upper layer of the Oants architecture, our new approach presented in this paper models their cooperation by exploiting the lower layer of the Oants blackboard architecture. This is not an ad hoc solution, but rather, it demonstrates Oants's flexibility in modelling the integration of cooperative reasoning systems.

Concretely, the single inference rule modelling the cooperation between Leo and a first-order theorem prover needs four arguments to be applicable: (1) an open proof goal, (2) a partial Leo proof, (3) a set of FO-like clauses in the partial proof, (4) a first-order refutation proof for the set of FO-like clauses. Each of these arguments is computed, that is, its instantiation is found, by an independent process. The first process finds open goals in the central proof object and posts them on the blackboard associated with the new rule. The second process starts an instance of the Leo theorem prover for each new open goal on the blackboard. Each Leo instance maintains its own set of FO-like clauses. The third process monitors these clauses, and as soon as it detects a change in this set, that is, if new FO-like clauses are added by Leo, it writes the entire set of clauses to the blackboard. Once FO-like clauses are posted, the fourth process first translates each of the clauses directly into a corresponding one in the format of the first-order theorem prover, and then starts the first-order theorem prover on them. Note that writing FO-like clauses on the blackboard is by far not as time consuming as generating higher-order proof objects. As soon as either Leo or the first-order prover finds a refutation, the second process reports Leo's proof or partial proof to the blackboard, that is, it instantiates argument (2). Once all four arguments of our inference rule are instantiated, the rule can be applied and the open proof goal can be closed in the central proof object. That is, the open goal can be proved by the cooperation between Leo and a first-order theorem prover. When computing applicability of the inference rule, the second and the fourth process concurrently spawn processes running Leo or a first-order prover on a different set of FO-like clauses. Thus, when actually applying the inference rule, all these instances of provers working on the same open subgoal are stopped.

The cooperation can be carried out between any first-order theorem prover and Leo instantiated with any strategy, thus resulting in different instantiations of the inference rule discussed above. While several first-order provers are integrated in Oants and could be used, Bliksem was sufficient for the case study reported in this paper (see Sec. 4). In most cases, more than one Bliksem process was necessary. But as the problems were always concerned with only one subgoal, only one Leo process had to be started.

Our approach to the cooperation between a higher-order and a first-order theorem prover has many advantages. The main one is that the communication is restricted to the transmission of clauses, and thus it avoids intermediate translation into the language of the central proof object. This significantly reduces the communication overhead and makes effective proving of more involved theorems feasible. A disadvantage of this approach is that we cannot easily translate and integrate the two proof objects produced by Leo and Bliksem into the central proof object maintained by Oants, as is possible when applying only one prover per open subgoal. Providing such translation remains future work. The repercussions will be discussed in more detail in Sec. 3.4.

Crucial to a successful cooperation between Leo and a first-order ATP is
obviously the generation of FO-like clauses. Leo always maintains a heap
of FO-like clauses. In the current Leo system this heap remains rather
small since Leo's standard
calculus intrinsically avoids primitive equality and instead
provides a rule that replaces occurrences of primitive equality with their
corresponding Leibniz definitions which are higher-order. The Leibniz principle defines equality
as follows =_{o alpha alpha } := `lambda` x_{ alpha} . `lambda` y_{ alpha}
. [FORALL P_{o alpha }. Px -> Py]. Leo also provides
a rule which replaces syntactically non-unifiable unification constraints
between terms of non-Boolean base type by their respective representations
that use Leibniz equality. While the clauses resulting from these rules
are still refutable in Leo, they are not refutable by Bliksem without
adding set theory axioms. We illustrate the effect by the following simple
example, where a_{ i} , b_{ i} , and f_{ i i } are constants:

[ Pa]

To circumvent this problem, we adapted the relevant rules in Leo. Instead of immediately constructing Leibniz representation of clauses, an intermediate representation containing primitive equality is generated and dumped on the heap of FO-like clauses. As a consequence, additional useful FO-like clauses are accumulated and the heap can become quite large, in particular, since we do not apply any subsumption to the set of FO-like clauses (this is generally done more efficiently by a first-order ATP anyway). Recent research has shown that Leibniz equality is generally very bad for automating higher-order proof search. Thus, future work in Leo includes providing support for full primitive equality and avoiding Leibniz equations.

Clearly, soundness and completeness properties depend on the corresponding properties of the systems involved, in our case, of Leo and Bliksem.

While there are many advantages in guaranteeing correctness of proofs by checking them, it is worth noting that the combination of Leo and Bliksem is sound under the assumption that the two systems are sound. Namely, to prove a theorem it is sufficient to show that a subset of clauses generated in the proof is inconsistent. If Leo generates an inconsistent set of clauses, then it does so correctly by assumption, be it a FO-like set or not. Assuming that the translation from FO-like clauses to truly first-order clauses preserves consistency/inconsistency, then a set of clauses that is given to Bliksem is inconsistent only if Leo generated an inconsistent set of clauses in the first place. By the assumption that Bliksem is sound follows that Bliksem will only generate the empty clause when the original clause set was inconsistent.

Thus, soundness of our cooperative approach critically relies only on
the soundness of the selected transformational mapping from FO-like
clauses to proper first-order clauses. We use the mapping from Tramp,
which has been previously shown to be sound and is based on
[Kerber-PhD]. Essentially, it injectively maps expressions such
as P(f(a)) to expressions such as @^{1} pred(P,@^{1}
fun(f,a)), where the @ are new first-order operators describing
function and predicate application for particular types and arities.
The injectivity of the mapping guarantees soundness, since it allows
each proof step to be mapped back from first-order to higher-order.
Hence, our higher-order/first-order cooperative approach between Leo
and Bliksem is sound.

The more interesting question is whether particular cooperation strategies will be complete as well. For instance, in Leo we may want to give higher preference to real higher-order steps which guarantee the generation of first-order clauses.

We conducted several experiments to evaluate our hybrid reasoning approach. In particular, we concentrated on problems given in Table 1. We investigated several Leo strategies in order to compare Leo's individual performance with the performance of the Leo-Bliksem cooperation. Our example set differs from the one in [GanzingerStuber-03-cade] in that it contains some additional problems, and it also omits an entry for problem SET108+1. This problem addresses the universal class and can therefore not be formalised in type theory in the same concise way as the other examples, but only in a way very similar to the one given in TPTP.

Table 4 presents the results of our experiments. All timings given in the table are in seconds. The first column contains the TPTP identifier of the problem. The second column relates some of the problems to their counterparts in the Journal of Formalized Mathematics (JFM; see mizar.org/JFM) where they originally stem from. This eases the comparison with the results in [C3,T2], where the problems from the JFM article Boolean Properties of Sets were already solved: the problems are named with prefix `B:'. Prefix `RS1:' stands for the JFM article Relations Defined on Sets. The third column lists the TPTP (v3.0.1 as of 20 January 2005, see http://www.tptp.org) difficulty rating of the problem, which indicates how hard the problem is for first-order ATPs (difficulty rating 1.00 indicates that no TPTP prover can solve the problem).

The fourth, fifth and sixth columns list whether Saturate, Muscadet (v2.4) and E-Setheo (csp04), respectively, can (+) or cannot (-) solve a problem. The seventh column lists the timing results for Vampire (v7). The results for Saturate are taken from [GanzingerStuber-03-cade] (a `?' in Table 4 indicates that the result was not listed in [GanzingerStuber-03-cade] and is thus unavailable). The results for Muscadet and E-Setheo are taken from the on-line version of the solutions provided with the TPTP. Since the listed results were obtained from different experiments on different platforms, their run-time comparison would be unfair, and was thus not carried out. The timings for Vampire, on the other hand, are based on private communication with A. Voronkov and they were obtained on a computer with a very similar specification as we used for the Leo-Bliksem timings. Note, that the results for Vampire and E-Setheo reported in [GanzingerStuber-03-cade] differ for some of the problems to the ones in TPTP. This is probably due to different versions of the systems tested, for instance, the TPTP uses Vampire version 7, while the results reported in [GanzingerStuber-03-cade] are based on version 5. The results in columns four through to seven show that some problems are still very hard for first-order ATPs, as well as for the special purpose theorem prover Muscadet. Column eight and nine in Table 4 list the results for Leo alone and Leo-Bliksem, respectively. Each of these two columns is further divided into sub-columns to allow for a detailed comparison.

TPTP- | Mizar | Diffi- | Satu- | Mus | E-Se- | Vamp- | c||LEO | cLEO-Bliksem | ||||||

Problem | Problem | culty | rate | cadet | theo | ire 7 | Strat. | Cl. | Time | Cl. | Time | FOcl | FOtm | GnCl |

-------- | ------------- | |||||||||||||

SET014+4 | .67 | + | + | + | .01 | ST | 41 | .16 | 34 | 6.76 | 19 | .01 | 7 | |

SET017+1 | .56 | - | - | + | .03 | EXT | 3906 | 57.52 | 25 | 8.54 | 16 | .01 | 74 | |

SET066+1 | 1.00 | ? | - | - | - | - | - | - | 26 | 6.80 | 20 | 10 | 56 | |

SET067+1 | .56 | + | + | + | .04 | ST | 6 | .02 | 13 | .32 | 16 | .01 | 12 | |

SET076+1 | .67 | + | - | + | .00 | - | - | - | 10 | .47 | 18 | .01 | 35 | |

SET086+1 | .22 | + | - | + | .04 | ST | 4 | .01 | 4 | .01 | N/A | N/A | N/A | |

SET096+1 | .56 | + | - | + | .03 | - | - | - | 27 | 7.99 | 14 | .01 | 25 | |

SET143+3 | B:67 | .67 | + | + | + | 68.71 | EIR | 37 | .38 | 33 | 7.93 | 18 | .01 | 19 |

SET171+3 | B:71 | .67 | + | + | - | 108.31 | EIR | 36 | .56 | 25 | 4.75 | 19 | .01 | 20 |

SET580+3 | B:23 | .44 | + | + | + | 14.71 | EIR | 25 | .19 | 6 | 2.73 | 8 | .01 | 13 |

SET601+3 | B:72 | .22 | + | + | + | 168.40 | EIR | 145 | 2.20 | 55 | 4.96 | 8 | .01 | 13 |

SET606+3 | B:77 | .78 | + | - | + | 62.02 | EIR | 21 | .33 | 17 | 10.8 | 15 | .01 | 5 |

SET607+3 | B:79 | .67 | + | + | + | 65.57 | EIR | 22 | .31 | 17 | 7.79 | 15 | .01 | 6 |

SET609+3 | B:81 | .89 | + | + | - | 161.78 | EIR | 37 | .60 | 26 | 6.50 | 19 | 10 | 17 |

SET611+3 | B:84 | .44 | + | - | + | 60.20 | EIR | 996 | 12.69 | 72 | 32.14 | 38 | .01 | 101 |

SET612+3 | B:85 | .89 | + | - | - | 113.33 | EIR | 41 | .54 | 18 | 3.95 | 6 | .01 | 7 |

SET614+3 | B:88 | .67 | + | + | - | 157.88 | EIR | 38 | .46 | 19 | 4.34 | 16 | .01 | 17 |

SET615+3 | B:89 | .67 | + | + | - | 109.01 | EIR | 38 | .57 | 17 | 3.59 | 6 | .01 | 9 |

SET623+3 | B:99 | 1.00 | ? | - | - | - | EXT | 43 | 8.84 | 23 | 9.54 | 10 | .01 | 14 |

SET624+3 | B:100 | .67 | + | - | + | .04 | ST | 4942 | 34.71 | 54 | 9.61 | 46 | .01 | 212 |

SET630+3 | B:112 | .44 | + | - | + | 60.39 | EIR | 11 | .07 | 6 | .08 | 8 | 10 | 4 |

SET640+3 | RS1:2 | .22 | + | - | + | 70.41 | EIR | 2 | .01 | 2 | .01 | N/A | N/A | N/A |

SET646+3 | RS1:8 | .56 | + | - | + | 59.63 | EIR | 2 | .01 | 2 | .01 | N/A | N/A | N/A |

SET647+3 | RS1:9 | .56 | + | - | + | 64.21 | EIR | 26 | .15 | 13 | .30 | 13 | .01 | 15 |

SET648+3 | RS1:10 | .56 | + | - | + | 64.22 | EIR | 26 | .15 | 14 | .30 | 13 | .01 | 16 |

SET649+3 | RS1:11 | .33 | - | - | + | 63.77 | EIR | 45 | .30 | 29 | 5.49 | 12 | .01 | 16 |

SET651+3 | RS1:13 | .44 | - | - | + | 63.88 | EIR | 20 | .10 | 11 | .16 | 10 | 10 | 11 |

SET657+3 | RS1:19 | .67 | + | - | + | 1.44 | EIR | 2 | .01 | 2 | .01 | N/A | N/A | N/A |

SET669+3 | RS1:19 | .22 | - | - | + | .34 | EI | 35 | .22 | 35 | .23 | N/A | N/A | N/A |

SET670+3 | RS1:33 | 1.00 | ? | - | - | - | EXT | 15 | .17 | 17 | .36 | 16 | .01 | 6 |

SET671+3 | RS1:34 | .78 | - | - | + | 218.02 | EIR | 78 | .64 | 7 | 2.71 | 10 | .01 | 14 |

SET672+3 | RS1:35 | 1.00 | ? | - | - | - | EXT | 27 | .4 | 30 | .70 | 21 | .01 | 11 |

SET673+3 | RS1:36 | .78 | - | - | + | 47.86 | EIR | 78 | .65 | 14 | 5.66 | 14 | .01 | 16 |

SET680+3 | RS1:47 | .33 | + | - | + | .07 | ST | 185 | .88 | 29 | 4.61 | 18 | .01 | 24 |

SET683+3 | RS1:50 | .22 | + | - | + | .06 | ST | 46 | .20 | 35 | 8.90 | 18 | 10 | 24 |

SET684+3 | RS1:51 | .78 | - | - | + | .33 | ST | 275 | 2.45 | 46 | 5.95 | 26 | .01 | 47 |

SET686+3 | RS1:53 | .56 | - | - | + | .11 | ST | 274 | 2.36 | 46 | 5.37 | 26 | .01 | 46 |

SET716+4 | .89 | + | + | - | - | ST | 39 | .45 | 18 | 3.81 | 18 | .01 | 118 | |

SET724+4 | .89 | + | + | - | - | EXT | 154 | 2.75 | 18 | 7.21 | 15 | 10 | 23 | |

SET741+4 | 1.00 | ? | - | - | - | - | - | - | - | - | - | - | - | |

SET747+4 | .89 | - | + | - | - | ST | 34 | .46 | 25 | 1.11 | 18 | 10 | 10 | |

SET752+4 | .89 | ? | + | - | - | - | - | - | 50 | 6.60 | 48 | .01 | 4363 | |

SET753+4 | .89 | ? | + | - | - | - | - | - | 15 | 3.07 | 12 | 10 | 19 | |

SET764+4 | .56 | + | + | + | .02 | EI | 9 | .05 | 8 | .04 | N/A | N/A | N/A | |

SET770+4 | .89 | + | + | - | - | - | - | - | - | - | - | - | - |

All our experiments (for the values of Leo and Leo-Bliksem) were conducted on a 2.4 GHz Xenon machine with 1GB of memory and an overall time limit of 100 seconds. For our experiments with Leo alone in column eight in Table 4 we tested four different strategies. Mainly, they differ in their treatment of equality and extensionality. This ranges from immediate expansion of primitive equality with Leibniz equality and limited extensionality reasoning, STANDARD (ST), to immediate expansion of primitive equality and moderate extensionality reasoning, EXT, to delayed expansion of primitive equality and moderate extensionality reasoning, EXT-INPUT (EI), and finally to delayed expansion of primitive equality and advanced recursive extensionality reasoning, EXT-INPUT-RECURSIVE (EIR). Column eight in Table 4 presents the fastest strategy for a respective problem (Strat.), the number of clauses generated by Leo (Cl.), and the total runtime (Time). While occasionally there were more than one Leo strategy that could solve a problem, it should be noted that none of the strategies was successful for all the problems solved by Leo.

In contrast to the experiments with Leo alone, we used only the EXT-INPUT strategy for our experiments with the Leo-Bliksem cooperation. Column nine in Table 4 presents the number of clauses generated by Leo (Cl.) together with the time (Time), and in addition, the number of first-order clauses sent to Bliksem (FOcl), the time used by Bliksem (FOtm), and the number of clauses generated by Bliksem (GnCl). Note, that we give the data only for the first instance that Bliksem actually succeeded in solving the problem. This time also includes the time needed to write and process input and output files over the network. While Leo and instances of Bliksem were running in separate threads (each run of Bliksem was given a 50 second time limit), the figures given in the `Time' column reflect the overall time needed for a successful proof. That is, it contains the time needed by all concurrent processes: Leo's own process as well as those processes administering the various instances of Bliksem. Since all these processes ran on a single processor, there is potential to ameliorate the overall runtimes by using real multiprocessing.

Note also, that the number of clauses in Leo's search space is typically low since subsumption is enabled. Subsumption, however, was not enabled for the accumulation of FO-like clauses in Leo's bag of FO-like clauses. This is why there are usually more clauses in this bag (which is sent to Bliksem) than there are available in Leo's search space. Finally, observe that for some problems a refutation was found after Leo's clausal normalisation, and therefore Bliksem was not applicable (N/A).

While Leo itself can solve a majority of the considered problems with some strategy, the Leo-Bliksem cooperation can solve more problems and, moreover, needs only a single Leo strategy. We can also observe that for many problems that appear to be relatively hard for Leo alone (e.g., SET017+1, SET611+3, SET624+3), the Leo-Bliksem cooperation solves them not only more quickly, but also it sometimes reduces the problems to relatively small higher-order pre-processing steps with subsequent easy first-order proofs, as for instance, in the case of SET017+1.

From a mathematical viewpoint the investigated problems are trivial and, hence, they should ideally be reliably and very efficiently solvable within a proof assistant. This has been achieved for the examples in Table 4 (except for SET741+4 and SET770+4) by our hybrid approach. While some of the proof attempts now require slightly more time than when using Leo alone with a specialised strategy, they are, in most cases, still faster than when proving with a first-order system.

Related to our approach is the Techs system [DeFu99], which
realises a cooperation between a set of heterogeneous first-order theorem
provers. Similarly to our approach, partial results in Techs are
exchanged between the different theorem provers in form of clauses. The
main difference to the work of Denzinger *et al.* (and other related
architectures like [FisIre:mapp98]) is that our system bridges between
higher-order and first-order automated theorem proving. Also, unlike in
Techs, we provide a declarative specification framework for modelling
external systems as cooperating, concurrent processes that can be
(re-)configured at run-time. Related is also the work of
Hurd [hurd2002e] which realises a generic interface between HOL and
first-order theorem provers. It is similar to the solution previously
achieved by Tramp [MeierCade00] in Omega, which serves as a basis for
the sound integration of ATPs into Oants. Both approaches pass essentially
first-order clauses to first-order theorem provers and then translate their
results back into HOL resp. Omega. Some further related work on the
cooperation of Isabelle with Vampire is presented
in [DBLP:conf/cade/MengP04]. The main difference of our work to the
related systems is that while our system calls first-order provers from
within higher-order proof search, this is not the case for
[hurd2002e,MeierCade00,DBLP:conf/cade/MengP04].

One of the motivations for our work is to show that the cooperation of higher-order and first-order automated theorem provers can be very successful and effective. The results of our case study provide evidence for this: our non-optimised system outperforms related work on state-of-the-art first-order theorem provers and their ad hoc extensions such as Saturate [GanzingerStuber-03-cade] on 45 mathematical problems chosen from the TPTP SET category. Among them are four problems which cannot be solved by any TPTP system to date. In contrast to the first-order situation, these problems can in fact be proved in our approach reliably from first principles, that is, without avoiding relevant base axioms of the underlying set theory, and moreover, without the need to provide relevant lemmata and definitions by hand.

The results of our case study motivate further research in the automation of higher-order theorem proving and the experimentation with different higher-order to first-order transformation mappings (such as the ones used by Hurd) that support our hybrid reasoning approach. They also provide further evidence for the usefulness of the Oants approach as described in [BeSo01,BeJaKeSo01b] for flexibly modelling the cooperation of reasoning systems.

Our results also motivate the need for a higher-order extension of the TPTP library in which alternative higher-order problem formalisations are linked with their first-order counterparts so that first-order theorem provers could also be evaluated against higher-order systems (and vice versa).

Future work is to investigate how far our approach scales up to more complex problems and more advanced mathematical theories. In less trivial settings as discussed in this paper, we will face the problem of selecting and adding relevant lemmata to avoid immediate reduction to first principles and to appropriately instantiate set variables. Relevant related work for this setting is Bishop's approach to selectively expand definitions as presented in [BiAn98] and Brown's PhD thesis on set comprehension in Church's type theory [Brown04].

- [Andrews02]
P. Andrews.
An Introduction to mathematical logic and Type Theory: To Truth
through Proof.
Number 27 in Applied Logic Series. Kluwer, 2002.
- [T2]
C. Benzmüller.
Equality and Extensionality in Higher-Order Theorem Proving.
PhD thesis, University of Saarland, Germany, 1999.
- [C5]
C. Benzmüller.
Extensional higher-order paramodulation and RUE-resolution.
Proc. of CADE-16, LNAI 1632, pages 399-413.
Springer, 1999.
- [J5]
C. Benzmüller.
Comparing approaches to resolution based higher-order theorem
proving.
Synthese, 133(1-2):203-235, 2002.
- [BeJaKeSo01b]
C. Benzmüller, M. Jamnik, M. Kerber, and V. Sorge.
Experiments with an Agent-Oriented Reasoning System.
Proc. of KI 2001, LNAI 2174, pages 409-424. Springer, 2001.
- [C3]
C. Benzmüller and M. Kohlhase.
LEO - a higher-order theorem prover.
Proc. of CADE-15, LNAI 1421. Springer, 1998.
- [BeSo98b]
C. Benzmüller and V. Sorge.
A Blackboard Architecture for Guiding Interactive Proofs.
Proc. of AIMSA'98, LNAI 1480, pages
102-114. Springer, 1998.
- [BeSo01]
C. Benzmüller and V. Sorge.
Oants - An open approach at combining Interactive and Automated
Theorem Proving.
Proc. of Calculemus-2000. AK Peters, 2001.
- [BiAn98]
M. Bishop and P. Andrews.
Selectively instantiating definitions.
Proc. of CADE-15, LNAI 1421. Springer, 1998.
- [Brown04]
C. E. Brown.
Set Comprehension in Church's Type Theory.
PhD thesis, Dept. of Mathematical Sciences, Carnegie Mellon
University, USA, 2004.
- [Nivelle99]
H. de Nivelle.
The Bliksem Theorem Prover, Version 1.12.
Max-Planck-Institut, Saarbrücken, Germany, 1999.
http://www.mpi-sb.mpg.de/~bliksem/manual.ps.
- [DeFu99]
J. Denzinger and D. Fuchs.
Cooperation of Heterogeneous Provers.
Proc. IJCAI-16, pages 10-15. Morgan Kaufmann, 1999.
- [FisIre:mapp98]
M. Fisher and A. Ireland.
Multi-agent proof-planning.
CADE-15 Workshop "Using AI methods in Deduction", 1998.
- [GanzingerStuber-03-cade]
H. Ganzinger and J. Stuber.
Superposition with equivalence reasoning and delayed clause normal
form transformation.
Proc. of CADE-19, LNAI 2741. Springer,
2003.
- [hurd2002e]
J. Hurd.
An LCF-style interface between HOL and first-order logic.
Automated Deduction -- CADE-18,
LNAI 2392, pages 134-138. Springer, 2002.
- [Kerber-PhD]
M. Kerber.
On the Representation of Mathematical Concepts and their
Translation into First Order Logic.
PhD thesis, Universität Kaiserslautern, Germany, 1992.
- [MeierCade00]
A. Meier.
TRAMP: Transformation of Machine-Found Proofs into
Natural Deduction Proofs at the Assertion Level.
Proc. of CADE-17, LNAI 1831. Springer, 2000.
- [DBLP:conf/cade/MengP04]
J. Meng and L. C. Paulson.
Experiments on supporting interactive proof using resolution.
Proc. of IJCAR 2004, LNCS 3097, pages
372-384. Springer, 2004.
- [NHRV01]
R. Nieuwenhuis, Th. Hillenbrand, A. Riazanov, and A. Voronkov.
On the evaluation of indexing techniques for theorem proving.
Proc. of IJCAR-01, LNAI 2083, pages
257-271. Springer, 2001.
- [Pastre01]
D. Pastre.
Muscadet2.3 : A knowledge-based theorem prover based on natural
deduction.
Proc. of IJCAR-01, LNAI 2083, pages
685-689. Springer, 2001.
- [vampire01]
A. Riazanov and A. Voronkov.
Vampire 1.1 (system description).
Proc. of IJCAR-01, LNAI 2083, pages
376-380. Springer, 2001.
- [Sorge01phd]
V. Sorge.
OANTS: A Blackboard Architecture for the Integration of Reasoning Techniques into Proof Planning.
PhD thesis, University of Saarland, Germany, 2001.
- [SW:TABLEAUX-2000]
G. Stenz and A. Wolf.
E-SETHEO: An Automated
^{3}Theorem Prover - System Abstract. Proc. of the TABLEAUX'2000, LNAI 1847, pages 436-440. Springer, 2000. - [SuSu98]
G. Sutcliffe and C. Suttner.
The TPTP Problem Library: CNF Release v1.2.1.
JAR, 21(2):177-203, 1998.

© 2005, Springer-Verlag. LNAI 3452, p.415-431. LPAR, Christoph Benzmüller, Volker Sorge, Mateja Jamnik, Manfred Kerber

The URL of this page is http://www.cs.bham.ac.uk/~mmk/papers/05-LPAR.html.