Can a Higher-Order and a First-Order Theorem Prover Cooperate?
This work was supported by EPSRC grant GR/M22031 and DFG-SFB 378 (first author), EU Marie-Curie-Fellowship HPMF-CT-2002-01701 (second author), and EPSRC Advanced Research Fellowship GR/R76783 (third author).

Christoph Benzmüller1      Volker Sorge2      Mateja Jamnik3      Manfred Kerber2

1Fachbereich Informatik, Universität des Saarlandes
66041 Saarbrücken, Germany ( 2School of Computer Science, The University of Birmingham
Birmingham B15 2TT, England, UK (, 3University of Cambridge Computer Laboratory
Cambridge CB3 0FD, England, UK (


State-of-the-art first-order automated theorem proving systems have reached considerable strength over recent years. However, in many areas of mathematics they are still a long way from reliably proving theorems that would be considered relatively simple by humans. For example, when reasoning about sets, relations, or functions, first-order systems still exhibit serious weaknesses. While it has been shown in the past that higher-order reasoning systems can solve problems of this kind automatically, the complexity inherent in their calculi and their inefficiency in dealing with large numbers of clauses prevent these systems from solving a whole range of problems. We present a solution to this challenge by combining a higher-order and a first-order automated theorem prover, both based on the resolution principle, in a flexible and distributed environment. By this we can exploit concise problem formulations without forgoing efficient reasoning on first-order subproblems. We demonstrate the effectiveness of our approach on a set of problems still considered non-trivial for many first-order theorem provers.

1.  Introduction

When dealing with problems containing higher-order concepts, such as sets, functions, or relations, today's state-of-the-art first-order automated theorem provers (ATPs) still exhibit weaknesses on problems considered relatively simple by humans (cf. [GanzingerStuber-03-cade]). One reason is that the problem formulations use an encoding in a first-order set theory, which makes it particularly challenging when trying to prove theorems from first principles, that is, basic axioms. Therefore, to aid ATPs in finding proofs, problems are often enriched by hand-picked additional lemmata, or axioms of the selected set theory are dropped leaving the theory incomplete. This has recently motivated extensions of state-of-the-art first-order calculi and systems, as for example presented in [GanzingerStuber-03-cade] for the Saturate system. The extended Saturate system can solve some problems from the SET domain in the TPTP [SuSu98] which Vampire [vampire01] and E- Setheo's [SW:TABLEAUX-2000] cannot solve.

While it has already been shown in [C3,T2] that many problems of this nature can be easily proved from first principles using a concise higher-order representation and the higher-order resolution ATP Leo, the combinatorial explosion inherent in Leo's calculus prevents the prover from solving a whole range of possible problems with one universal strategy. Often higher-order problems require only relatively few but essential steps of higher-order reasoning, while the overwhelming part of the reasoning is first-order or even propositional level. This suggests that Leo's performance could be improved when combining it with a first-order ATP to search efficiently for a possible refutation in the subset of those clauses that are essentially first-order.

The advantages of such a combination -- further discussed in Sec. 2 -- are not only that many problems can still be efficiently shown from first principles in a general purpose approach, but also that problems can be expressed in a very concise way. For instance, we present 45 problems from the SET domain of the TPTP-v3.0.1, together with their entire formalisation in less than two pages in this paper, which is difficult to achieve within a framework that does not provide lambda-abstraction. We use this problem set, which is an extension of the problems considered in [GanzingerStuber-03-cade], in Sec. 4 to show the effectiveness of our approach. While many of the considered problems can be proved by Leo alone with some strategy, the combination of Leo with the first-order ATP Bliksem [Nivelle99] is not only able to show more problems, but also needs only a single strategy to solve them. Several of our problems are considered very challenging by the first-order community and five of them (of which Leo can solve four) have a TPTP rating of 1.00, saying that they cannot be solved by any TPTP prover to date.

Technically, the combination -- described in more detail in Sec. 3 -- has been realised in the concurrent reasoning system Oants [Sorge01phd,BeSo01] which enables the cooperation of hybrid reasoning systems to construct a common proof object. In our past experiments, Oants has been successfully employed to check the validity of set equations using higher-order and first-order ATPs, model generation, and computer algebra [BeJaKeSo01b]. While this already enabled a cooperation between Leo and a first-order ATP, the proposed solution could not be classified as a general purpose approach. A major shortcoming was that all communication of partial results had to be conducted via the common proof object, which was very inefficient for hard examples. Thus, the solved examples from set theory were considered too trivial, albeit they were often similar to those still considered challenging in the TPTP in the first-order context. In this paper we now present a novel approach to the cooperation between Leo and Bliksem inside Oants by decentralising communication. This leads not only to a higher overall efficiency -- Sec. 4 details our results -- but also to a general purpose approach based on a single strategy in Leo.

2.  Why Linking Higher-Order and First-Order?

Existing higher-order ATPs generally exhibit deficits in efficiently reasoning with first-order problems for several reasons. Unlike in the case of first-order provers, for which sophisticated calculi and strategies, as well as advanced implementation techniques, such as term indexing [NHRV01], have been developed, fully mechanisable higher-order calculi are still at a comparably early stage of development. Some problems are much harder in higher-order, for instance, unification is undecidable, strong constraining term- and literal-orderings are not available, extensionality reasoning and set variable instantiation has to be addressed. Nevertheless, for some mathematical problem domains, such as naive set theory, for instance, automated higher-order reasoning performs very well.

We motivate the need for linking higher-order and first-order ATPs with some examples from Table 1.

-------- -------------
Example[SET]   Problem Formalisation
Example[014+4]   FORALL Xo alpha ,Yo alpha , Ao alpha . [[(X subseteq A)  &  (Y subseteq A)] -> ((X union Y) subseteq A)]
Example[017+1]   FORALL x alpha , y alpha , z alpha . [UnOrderedPair(x,y) = UnOrderedPair(x,z) -> y = z]
Example[066+1]   FORALL x alpha , y alpha . [UnOrderedPair(x,y) = UnOrderedPair(y,x)
Example[067+1]   FORALL x alpha , y alpha . [(UnOrderedPair(x,x) subseteq UnOrderedPair(x,y))]
Example[076+1]   FORALL x alpha , y alpha . FORALL Zo alpha . x in Z &  y in Z-> (UnOrderedPair(x,y) subseteq Z)
Example[086+1]   FORALL x alpha . EXISTS y alpha . [y in Singleton(x)]
Example[096+1]   FORALL Xo alpha , y alpha . [(X subseteq Singleton(y)) -> [X = EMPTYSET  |  X = Singleton(y)]]
Example[143+3]   FORALL Xo alpha , Yo alpha , Zo alpha . [((X intersect Y) intersect Z) = (X intersect (Y intersect Z))]
Example[171+3]   FORALL Xo alpha , Yo alpha , Zo alpha . [(X union (Y intersect Z)) = ((X union Y) intersect (X union Z))]
Example[580+3]   FORALL Xo alpha , Yo alpha , u alpha . [u in ExclUnion(X,Y) <-> [u in X <-> u NOT  in Y]]
Example[601+3]   FORALL. Xo alpha , Yo alpha , Zo alpha [((X intersect Y) union ((Y intersect Z) union (Z intersect X))) = ((X union Y) intersect ((Y union Z) intersect (Z union X)))]
Example[606+3]   FORALL Xo alpha , Yo alpha . [(X WITHOUT (X intersect Y)) = (X WITHOUT Y)]
Example[607+3]   FORALL Xo alpha , Yo alpha . [(X union (Y WITHOUT X)) = (X union Y)]
Example[609+3]   FORALL Xo alpha , Yo alpha , Zo alpha . [(X WITHOUT (Y WITHOUT Z)) = ((X WITHOUT Y) union (X intersect Z))]
Example[611+3]   FORALL Xo alpha , Yo alpha . [(X intersect Y) = EMPTYSET <-> (X WITHOUT Y) = X]
Example[612+3]   FORALL Xo alpha , Yo alpha , Zo alpha . [(X WITHOUT (Y union Z)) = ((X WITHOUT Y) intersect (X WITHOUT Z))]
Example[614+3]   FORALL Xo alpha , Yo alpha , Zo alpha . [((X WITHOUT Y) WITHOUT Z) = (X WITHOUT (Y union Z))]
Example[615+3]   FORALL Xo alpha , Yo alpha , Zo alpha . [((X union Y) WITHOUT Z) = ((X WITHOUT Z) union (Y WITHOUT Z))]
Example[623+3]   FORALL Xo alpha , Yo alpha , Zo alpha . [ExclUnion(ExclUnion(X,Y),Z) = ExclUnion(X,ExclUnion(Y,Z))]
Example[624+3]   FORALL Xo alpha , Yo alpha , Zo alpha . [Meets(X,(Y union Z)) <-> [Meets(X,Y)  |  Meets(X,Z)]]
Example[630+3]   FORALL Xo alpha , Yo alpha . [Misses((X intersect Y),ExclUnion(X,Y))]
Example[640+3]   FORALL Ro beta alpha , Qo beta alpha . [Subrel(R,Q) -> Subrel(R,((lambda u alpha .T)*(lambda v beta . T)))]
Example[646+3]   FORALL x alpha , y beta . [Subrel(Pair(x,y),((lambda u alpha .T)*(lambda v beta .T)))]
Example[647+3]   FORALL Ro beta alpha , Xo alpha . [(RDom(R) subseteq X) -> Subrel(R,(X*RCoDom(R)))]
Example[648+3]   FORALL Ro beta alpha , Yo beta . [(RCoDom(R) subseteq Y) -> Subrel(R,(RDom(R)*Y))]
Example[649+3]   FORALL Ro beta alpha , Xo alpha , Yo beta . [[(RDom(R) subseteq X)  &  (RCoDom(R) subseteq Y)] -> Subrel(R,(X*Y))]
Example[651+3]   FORALL Ro beta alpha . [(RDom(R) subseteq Ao alpha ) -> Subrel(R,(A*(lambda u beta .T)))]
Example[657+3]   FORALL Ro beta alpha . [(Field(R) subseteq ((lambda u alpha .T) union (lambda v beta .T)))]
Example[669+3]   FORALL Ro alpha alpha . [Subrel(Id(lambda u alpha .T),R) -> [((lambda u alpha .T) subseteq RDom(R))  &  (lambda u alpha .T) = RCoDom(R)]]
Example[670+3]   FORALL Zo alpha , Ro beta alpha , Xo alpha Yo beta . [IsRelOn(R,X,Y) -> IsRelOn(RestrictRDom(R,Z),Z,Y) ]
Example[671+3]   FORALL Zo alpha , Ro beta alpha , Xo alpha , Yo beta . [[IsRelOn(R,X,Y)  &  (X subseteq Z)] -> RestrictRDom(R,Z) = R]
Example[672+3]   FORALL Zo beta , Ro beta alpha , Xo alpha Yo beta . [IsRelOn(R,X,Y) -> IsRelOn(Restric(tRCodom(R,Z)),X,Z) ]
Example[673+3]   FORALL Zo beta , Ro beta alpha , Xo alpha , Yo beta . [[IsRelOn(R,X,Y)  &  (Y subseteq Z)] -> Restric(tRCodom(R,Z)) = R]
Example[680+3]   FORALL Ro beta alpha , Xo alpha , Yo beta . [IsRelOn(R,X,Y) ->
[FORALL u alpha . u in X  ->  [u in RDom(R) <-> EXISTS v beta . v in Y & R(u,v)]]]
Example[683+3]   FORALL Ro beta alpha , Xo alpha , Yo beta . [IsRelOn(R,X,Y) ->
[FORALL v beta . v in Y  ->  [v in RCoDom(R) ->EXISTS u alpha . u in X & u in RDom(R)]]]
Example[684+3]   FORALL Po beta alpha , Ro gamma beta , x alpha , z gamma . [RelComp(P,R) x z <-> EXISTS y beta . P x y  &  R y z]
Example[686+3]   FORALL Zo alpha , Ro gamma beta , x alpha . [x in InverseImageR(R,Z) <-> EXISTS y alpha . R x y  &  x in Z]
Example[716+4]   FORALL F beta alpha , G gamma beta . [[Inj(F)  &  Inj(G)] -> Inj((F o G))]
Example[724+4]   FORALL F beta alpha , G gamma beta , H gamma beta . [[(G o F) = (H o F)  &  Surj(F)] -> G = H]
Example[741+4]   FORALL F beta alpha , G gamma beta , H alpha gamma . [[Inj((F o G) o H)  &  Surj((G o H) o F)  &  Surj((H o F) o G)] -> Bij(H)]
Example[747+4]   FORALL F beta alpha , G gamma beta , <= 1o alpha alpha , <= 2o beta beta , <= 3o gamma gamma . [[IncreasingF(F, <= 1, <= 2)  &  DecreasingF(G, <= 2, <= 3)] ->
DecreasingF((G o F), <= 1, <= 3)]
Example[752+4]   FORALL Xo alpha , Yo alpha , F beta alpha . [InverseImageF(F,(X union Y)) = (InverseImageF(F,X) union InverseImageF(F,Y))]
Example[753+4]   FORALL Xo alpha , Yo alpha , F beta alpha . [(InverseImageF(F,(X intersect Y)) subseteq (InverseImageF(F,X) intersect InverseImageF(F,Y)))]
Example[764+4]   FORALL F beta alpha . [InverseImageF(F, EMPTYSET ) = EMPTYSET ]
Example[770+4]   FORALL Ro beta alpha , Qo beta alpha . [[EquivRel(R)  &  EquivRel(Q)] ->
[EquivClasses(R) = EquivClasses(Q)  |  Disjoint(EquivClasses(R),EquivClasses(Q))]]
-------- -------------

1. Problems from TPTP for the evaluation of Oants.

It contains a range of challenging problems taken from the TPTP, against which we will evaluate our system in Sec. 4. The problems are given by the identifiers used in the SET domain of the TPTP, and are formalised in a variant of Church's simply typed lambda-calculus with prefix polymorphism. In classical type theory terms and all their sub-terms are typed. Polymorphism allows the introduction of type variables such that statements can be made for all types. For instance, in problem SET014+4 the universally quantified variable Xo alpha denotes a mapping from objects of type alpha to objects of type o. We use Church's notation o alpha , which stands for the functional type alpha  ->  o. The reader is referred to [Andrews02] for a more detailed introduction. In the remainder, o will denote the type of truth values, and small Greek letters will denote arbitrary types. Thus, Xo alpha (resp. its a-longform lambda y alpha . Xy) is actually a characteristic function denoting the set of elements of type alpha , for which the predicate associated with X holds. As further notational convention, we use capital letter variables to denote sets, functions, or relations, while lower case letters denote individuals. Types are usually only given in the first occurrence of a variable and omitted if inferable from the context.

The problems in Table 1 employ defined concepts that are specified in a knowledge base of hierarchical theories that Leo has access to. All concepts necessary for defining our problems in Table 1 are given in Table 2.

Defined Notions in Theory Typed Set
_ in _   lambda x alpha , Ao alpha . [A x]
EMPTYSET   [lambda x alpha . F]
(_ subseteq _)   lambda Ao alpha , Bo alpha . [FORALL x alpha . x in A  ->  x in B]
(_ union _)   lambda Ao alpha , Bo alpha . [lambda x alpha . x in A v x in B]
(_ intersect _)   lambda Ao alpha , Bo alpha . [lambda x alpha . x in A & x in B]
Complement(_)   lambda Ao alpha . [lambda x alpha . x  NOT in A]
(_ WITHOUT _)   lambda Ao alpha , Bo alpha . [lambda x alpha . x in A & x  NOT in B]
ExclUnion(_,_)   lambda Ao alpha , Bo alpha . [(A WITHOUT B)  union  (B WITHOUT A)]
Disjoint(_,_)   lambda Ao alpha , Bo alpha . [A union B = {}]
Meets(_,_)   lambda Ao alpha , Bo alpha . [EXISTS x alpha . x in A & x in B]
Misses(_,_)   lambda Ao alpha , Bo alpha . [NOT  EXISTS x alpha . x in A & x in B]
Defined Notions in Theory Relation
UnOrderedPair(_,_)   lambda x alpha , y alpha . [lambda u alpha . u = x v u = y]
Singleton(_)   lambda x alpha . [lambda u alpha . u = x]
Pair(_,_)   lambda x alpha , y beta . [lambda u alpha , v beta . u = x & v = y]
(_*_)   lambda Ao alpha , Bo beta . [lambda u alpha , v beta . u in A & v in B]
RDom(_)   lambda Ro beta alpha . [lambda x alpha . EXISTS y beta . R x y]
RCoDom(_)   lambda Ro beta alpha . [lambda y beta . EXISTS x alpha . R x y]
Subrel(_,_)   lambda Ro beta alpha , Qo beta alpha . [FORALL x alpha . FORALL y alpha . R x y  ->  Q x y]
Id(_)   lambda Ao alpha . [lambda x alpha , y alpha . x in A & x = y]
Field(_)   lambda Ro beta alpha . [ RDom(B)  union  RCoDom(R)]
IsRelOn(_,_,_)   lambda Ro beta alpha , Ao alpha . lambda Bo beta . [FORALL x alpha , y beta . R x y  ->  (x in A & x in B)]
Restric(tRCodom(_,_))   lambda Ro beta alpha , Ao alpha . [lambda x alpha , y beta . x in A & R x y]
RelComp(_,_)   lambda Ro beta alpha , Qo gamma beta . [lambda x alpha , z gamma . EXISTS y beta . R x y & R y z]
InverseImageR(_,_)   lambda Ro beta alpha , Bo beta . [lambda x alpha . EXISTS y beta . y in B & R x y]
Reflexive(_)   lambda Ro beta alpha . [FORALL x alpha . R x x]
Symmetric(_)   lambda Ro beta alpha . [FORALL x alpha . FORALL y alpha . R x y  ->  R y x]
Transitive(_)   lambda Ro beta alpha . [FORALL x alpha . FORALL y alpha . FORALL z alpha . R x y & R y z  ->  R x z]
EquivRel(_)   lambda Ro beta alpha . [Reflexive(R) & Symmetric(R) & Transitive(R)]
EquivClasses(_)   lambda Ro alpha alpha . [lambda Ao alpha . EXISTS u alpha . u in A & FORALL v alpha . v in A  <->  R u v]
Defined Notions in Theory Function
Inj(_)   lambda F beta alpha . [FORALL x alpha , y beta . F(x) = F(y)  ->  x = y]
Surj(_)   lambda F beta alpha . [FORALL y beta . EXISTS x alpha . y = F(x)]
Bij(_)   lambda F beta alpha . Surj(F) & Inj(F)
InverseImageF(_,_)   lambda F beta alpha , Ao alpha . [lambda y beta . EXISTS x alpha . x in A & y = F(x)]
InverseImageF(_,_)   lambda F beta alpha , Bo beta . [lambda x alpha . EXISTS y beta . y in B & y = F(x)]
(_ o _)   lambda F beta alpha , G gamma beta . [lambda x alpha . G(F(x))]
IncreasingF(_,_,_)   lambda F beta alpha , <= 1o alpha alpha , <= 2o beta beta . [FORALL x alpha , y alpha . x <= 1 y  ->  F(x) <= 2 F(y) ]
DecreasingF(_,_,_)   lambda F beta alpha , <= 1o alpha alpha , <= 2o beta beta . [FORALL x alpha , y alpha . x <= 1 y  ->  F(y) <= 2 F(x)]
-------- -------------

2. Defined concepts occurring in problems from Table 1.

Concepts are defined in terms of lambda-expressions and they may contain other, already specified concepts. For presentation purposes, we use customary mathematical symbols  union , union , etc., for some concepts like union, intersection, etc., and we also use infix notation. For instance, the definition of union on sets can be easily read in its more common mathematical representation A  union  B := {x | x in A v x in B}. Before proving a problem, Leo always expands -- recursively, if necessary -- all occurring concepts. This straightforward expansion to first principles is realised by an automated preprocess in our current approach.

SET171+3 We first discuss example SET171+3 to contrast our formalisation to a standard first-order one. After recursively expanding the input problem, that is, completely reducing it to first principles, Leo turns it into a negated unit clause. Since this initial clause is not in normal form, Leo first normalises it with explicit clause normalisation rules to reach some proper initial clauses. In our concrete case, this normalisation process leads to the following unit clause consisting of a (syntactically not solvable) unification constraint (here Bo alpha ,Co alpha ,Do alpha are Skolem constants and Bx is obtained from expansion of x in B):

[(lambda x alpha . Bx v (Cx & Dx)) =? (lambda x alpha . (Bx v Cx) & (Bx v Dx))]

Note that negated primitive equations are generally automatically converted by Leo into unification constraints. This is why [(lambda x alpha . Bx v (Cx & Dx)) =? (lambda x alpha . (Bx v Cx) & (Bx v Dx))] is generated, and not [(lambda x alpha . Bx v (Cx & Dx)) = (lambda x alpha . (Bx v Cx) & (Bx v Dx))]F. Observe, that we write [.]T and [.]F for positive and negative literals, respectively. Leo then applies its goal directed functional and Boolean extensionality rules which replace this unification constraint by the negative literal (where x is a Skolem constant):

[(Bx v (Cx & Dx))  <->  ((Bx v Cx) & (Bx v Dx))]F
This unit clause is again not normal; normalisation, factorisation and subsumption yield the following set of clauses:
[Bx]F            [Bx]T v [Cx]T            [Bx]T v [Dx]T            [Cx]F v [Dx]F
This set is essentially of propositional logic character and trivially refutable. Leo needs 0.56 seconds for solving the problem and generates a total of 36 clauses.

Let us consider now this same example SET171+3 in its first-order formulation from the TPTP (see Table 3).

Assumptions:  (1)   FORALL B, C, x. [x in (B  union  C)  <->  x in B v x in C]
 (2)  FORALL B, C, x. [x in (B union C)  <->  x in B & x in C]
 (3)   FORALL B, C . [B = C  <->  B subset C & C subset B]
 (4)   FORALL B, C . [B  union  C = C  union  B]
 (5)   FORALL B, C . [B union C = C union B]
 (6)   FORALL B, C . [B subset C  <->  FORALL x . x in B  ->  x in C]
 (7)   FORALL B, C . [B = C  <->  FORALL x . x in B <->  x in C]
Proof Goal:  (8)   FORALL B, C, D . [B  union  (C union D) = (B union  C) union (B  union  D)]

3. TPTP problem Example[SET171+3] -- distributivity of  union  over union .

We can observe that the assumptions provide only a partial axiomatisation of naive set theory. On the other hand, the specification introduces lemmata that are useful for solving the problem. In particular, assumption (7) is trivially derivable from (3) with (6). Obviously, clausal normalisation of this first-order problem description yields a much larger and more difficult set of clauses. Furthermore, definitions of concepts are not directly expanded as in Leo. It is therefore not surprising that most first-order ATPs still fail to prove this problem. In fact, very few TPTP provers were successful in proving SET171+3. Amongst them are Muscadet 2.4. [Pastre01], Vampire 7.0, and Saturate. The natural deduction system Muscadet uses special inference rules for sets and needs 0.2 seconds to prove this problem. Vampire needs 108 seconds. The Saturate system [GanzingerStuber-03-cade] (which extends Vampire with Boolean extensionality rules that are a one-to-one correspondence to Leo's rules for Extensional Higher-Order Paramodulation [C5]) can solve the problem in 2.9 seconds while generating 159 clauses. The significance of such comparisons is clearly limited since different systems are optimised to a different degree. One noted difference between the experiments with first-order provers listed above, and the experiments with Leo and Leo-Bliksem is that first-order systems often use a case tailored problem representation (e.g., by avoiding some base axioms of the addressed theory), while Leo and Leo-Bliksem have a harder task of dealing with a general (not specifically tailored) representation.

For the experiments with Leo and the cooperation of Leo with the first-order theorem prover Bliksem, lambda-abstraction as well as the extensionality treatment inherent in Leo's calculus [J5] is used. This enables a theoreticallyFor pragmatic reasons, such as efficiency, most of Leo's tactics are incomplete. Leo's philosophy is to rely on a theoretically complete calculus, but to practically provide a set of complimentary strategies so that these cover a broad range of theorems. Henkin-complete proof system for set theory. In the above example SET171+3, Leo generally uses the application of functional extensionality to push extensional unification constraints down to base type level, and then eventually applies Boolean extensionality to generate clauses from them. These are typically much simpler and often even propositional-like or first-order-like (FO-like, for short), that is, they do not contain any `real' higher-order subterms (such as a lambda-abstraction or embedded equations), and are therefore suitable for treatment by a first-order ATP or even a propositional logic decision procedure.

SET624+3 Sometimes, extensionality treatment is not required and the originally higher-order problem is immediately reduced to only FO-like clauses. For example, after expanding the definitions, problem SET624+3 yields the following clause (where Bo alpha ,Co alpha ,Do alpha are again Skolem constants): ]F Normalisation results in 26 FO-like clauses, which present a hard problem for Leo: it needs approx. 35 seconds (see Sec. 4) to find a refutation, whereas first-order ATPs only need a fraction of a second.

SET646+3 Sometimes, problems are immediately refuted after the initial clause normalisation. For example, after definition expansion in problem SET646+3 we get the following clause (where Bo alpha ,Co alpha ,x alpha are again Skolem constants):

[Ax  ->  (FORALL y beta . By  ->  (FORALL u alpha . FORALL v beta . (u = x & v = y)  ->  ((NOT  F) & (NOT  F))))]F
Normalisation in Leo immediately generates a basic refutation (i.e., a clause [F]Tv [F]T) without even starting proof search.

SET611+3 The examples discussed so far all essentially apply extensionality treatment and normalisation to the input problem in order to immediately generate a set of inconsistent FO-like clauses. Problem SET611+3 is more complicated as it requires several reasoning steps in Leo before the initially consistent set of available FO-like clauses grows into an inconsistent one. After definition expansion, Leo is first given the input clause:

[FORALL Ao alpha , Bo alpha . (lambda x alpha . (Ax & Bx)) = (lambda x alpha . F))  <->  (lambda x alpha . (Ax & NOT  Bx)) = (lambda x alpha . Ax)]F
which it normalises into:
[(lambda x alpha . (Ax & Bx)) =? (lambda x alpha . F)] v [(lambda x alpha . (Ax & NOT  Bx)) =? (lambda x alpha . Ax)]                    (9)
[(lambda x alpha . (Ax & Bx)) = (lambda x alpha . F)]T v [(lambda x alpha . (Ax & NOT  Bx)) = (lambda x alpha . Ax)]T                     (10)
As mentioned before, the unification constraint (9) corresponds to:
[(lambda x alpha . (Ax & Bx)) = (lambda x alpha . F)]F v [(lambda x alpha . (Ax & NOT  Bx)) = (lambda x alpha . Ax)]F                     (11)
Leo has to apply to each of these clauses and to each of their literals appropriate extensionality rules. Thus, several rounds of Leo's set-of-support-based reasoning procedure are required, so that all necessary extensionality reasoning steps are performed, and sufficiently many FO-like clauses are generated which can be refuted by Bliksem.

In summary, each of the examples discussed in this section exposes a motivation for our higher-order/first-order cooperative approach to theorem proving. In particular, they show that:

3.  Higher-Order/First-Order Cooperation via Oants

The cooperation between higher-oder and first-order reasoners, which we investigate in this paper, is realised in the concurrent hierarchical blackboard architecture Oants [BeSo98b]. We first describe in Sec. 3.1 the existing Oants architecture. In order to overcome some of its problems, in particular efficiency problems, we devised within Oants a new and improved cooperation method for the higher-order ATP Leo and first-order provers (in particular, Bliksem) - we describe this in Sec. 3.2. We address the question of how to generate the necessary clauses in Sec. 3.3, and discuss soundness and completeness of our implementation of the higher-order/first-order cooperation in Sec. 3.4.

3.1.  Oants

Oants was originally conceived to support interactive theorem proving but was later extended to a fully automated proving system [Sorge01phd,BeSo01]. Its basic idea is to compose a central proof object by generating, in each proof situation, a ranked list of potentially applicable inference steps. In this process, all inference rules, such as calculus rules or tactics, are uniformly viewed with respect to three sets: premises, conclusions, and additional parameters. The elements of these three sets are called arguments of the inference rule and they usually depend on each other. An inference rule is applicable if at least some of its arguments can be instantiated with respect to the given proof context. The task of the Oants architecture is now to determine the applicability of inference rules by computing instantiations for their arguments.

The architecture consists of two layers. On the lower layer, possible instantiations of the arguments of individual inference rules are computed. In particular, each inference rule is associated with its own blackboard and concurrent processes, one for each argument of the inference rule. The role of every process is to compute possible instantiations for its designated argument of the inference rule, and to record these on the blackboard. The computations are carried out with respect to the given proof context and by exploiting information already present on the blackboard, that is, argument instantiations computed by other processes. On the upper layer, the information from the lower layer is used for computing and heuristically ranking the inference rules that are applicable in the current proof state. The most promising rule is then applied to the central proof object and the data on the blackboards is cleared for the next round of computations.

Oants employs resource reasoning to guide search.Oants provides facilities to define and modify the processes at run-time. But notice that we do not use these advanced features in the case study presented in this paper. This enables the controlled integration (e.g., by specifying time-outs) of full-fledged external reasoning systems such as automated theorem provers, computer algebra systems, or model generators into the architecture. The use of the external systems is modelled by inference rules, usually one for each system. Their corresponding computations are encapsulated in one of the independent processes in the architecture. For example, an inference rule modelling the application of an ATP has its conclusion argument set to be an open goal. A process can then place an open goal on the blackboard, where it is picked up by a process that applies the prover to it. Any computed proof or partial-proof from the external system is again written to the blackboard from where it is subsequently inserted into the proof object when the inference rule is applied. While this setup enables proof construction by a collaborative effort of diverse reasoning systems, the cooperation can only be achieved via the central proof object. This means that all partial results have to be translated back and forth between the syntaxes of the integrated systems and the language of the proof object. Since there are many types of integrated systems, the language of the proof object -- a higher-order language even richer than Leo's, together with a natural deduction calculus -- is expressive but also cumbersome. This leads not only to a large communication overhead, but also means that complex proof objects have to be created (large clause sets need to be transformed into large single formulae to represent them in the proof object; the support for this in Oants to date is inefficient), even if the reasoning of all systems involved is clause-based. Consequently, the cooperation between external systems is typically rather inefficient [BeJaKeSo01b].

3.2.   Cooperation via a single inference rule

In order to overcome the problem of the communication bottleneck described above, we devised a new method for the cooperation between a higher-order and a first-order theorem prover within Oants. Rather than modelling each theorem prover as a separate inference rule (and hence needing to translate the communication via the language of the central proof object), we model the cooperation between a higher-order (concretely, Leo) and a first-order theorem prover (in our case study Bliksem) in Oants as a single inference rule. The cooperation between these two theorem provers is carried out directly and not via the central proof object. This avoids translating clause sets into single formulae and back. While in our previous approach the cooperation between Leo and an FO-ATP was modelled at the upper layer of the Oants architecture, our new approach presented in this paper models their cooperation by exploiting the lower layer of the Oants blackboard architecture. This is not an ad hoc solution, but rather, it demonstrates Oants's flexibility in modelling the integration of cooperative reasoning systems.

Concretely, the single inference rule modelling the cooperation between Leo and a first-order theorem prover needs four arguments to be applicable: (1) an open proof goal, (2) a partial Leo proof, (3) a set of FO-like clauses in the partial proof, (4) a first-order refutation proof for the set of FO-like clauses. Each of these arguments is computed, that is, its instantiation is found, by an independent process. The first process finds open goals in the central proof object and posts them on the blackboard associated with the new rule. The second process starts an instance of the Leo theorem prover for each new open goal on the blackboard. Each Leo instance maintains its own set of FO-like clauses. The third process monitors these clauses, and as soon as it detects a change in this set, that is, if new FO-like clauses are added by Leo, it writes the entire set of clauses to the blackboard. Once FO-like clauses are posted, the fourth process first translates each of the clauses directly into a corresponding one in the format of the first-order theorem prover, and then starts the first-order theorem prover on them. Note that writing FO-like clauses on the blackboard is by far not as time consuming as generating higher-order proof objects. As soon as either Leo or the first-order prover finds a refutation, the second process reports Leo's proof or partial proof to the blackboard, that is, it instantiates argument (2). Once all four arguments of our inference rule are instantiated, the rule can be applied and the open proof goal can be closed in the central proof object. That is, the open goal can be proved by the cooperation between Leo and a first-order theorem prover. When computing applicability of the inference rule, the second and the fourth process concurrently spawn processes running Leo or a first-order prover on a different set of FO-like clauses. Thus, when actually applying the inference rule, all these instances of provers working on the same open subgoal are stopped.

The cooperation can be carried out between any first-order theorem prover and Leo instantiated with any strategy, thus resulting in different instantiations of the inference rule discussed above. While several first-order provers are integrated in Oants and could be used, Bliksem was sufficient for the case study reported in this paper (see Sec. 4). In most cases, more than one Bliksem process was necessary. But as the problems were always concerned with only one subgoal, only one Leo process had to be started.

Our approach to the cooperation between a higher-order and a first-order theorem prover has many advantages. The main one is that the communication is restricted to the transmission of clauses, and thus it avoids intermediate translation into the language of the central proof object. This significantly reduces the communication overhead and makes effective proving of more involved theorems feasible. A disadvantage of this approach is that we cannot easily translate and integrate the two proof objects produced by Leo and Bliksem into the central proof object maintained by Oants, as is possible when applying only one prover per open subgoal. Providing such translation remains future work. The repercussions will be discussed in more detail in Sec. 3.4.

3.3.  Extracting FO-like clauses from Leo

Crucial to a successful cooperation between Leo and a first-order ATP is obviously the generation of FO-like clauses. Leo always maintains a heap of FO-like clauses. In the current Leo system this heap remains rather small since Leo's standard calculus intrinsically avoids primitive equality and instead provides a rule that replaces occurrences of primitive equality with their corresponding Leibniz definitions which are higher-order. The Leibniz principle defines equality as follows =o alpha alpha := lambda x alpha . lambda y alpha . [FORALL Po alpha . Px  ->  Py]. Leo also provides a rule which replaces syntactically non-unifiable unification constraints between terms of non-Boolean base type by their respective representations that use Leibniz equality. While the clauses resulting from these rules are still refutable in Leo, they are not refutable by Bliksem without adding set theory axioms. We illustrate the effect by the following simple example, where a i , b i , and f i i are constants:

a = b  ->  f(a) = f(b)
Depending on whether we work with primitive equality or Leibniz equality this problem is reduced to the clause sets in either (12) or (13) respectively (in the latter Po i is a new free variable, and Qo i is a new Skolem constant):
[a = b]T [f(a) =? f(b)]                     (12)
[ Pa]F v [ Pb]T [Q(f(a))]T [Q(f(b))]F                     (13)
While the former is obviously refutable in Bliksem, the latter is not. Leo, however, still finds a refutation for the latter and generates the crucial substitution Parrow lambda x alpha . Q(f(x)) by higher-order pre-unification.

To circumvent this problem, we adapted the relevant rules in Leo. Instead of immediately constructing Leibniz representation of clauses, an intermediate representation containing primitive equality is generated and dumped on the heap of FO-like clauses. As a consequence, additional useful FO-like clauses are accumulated and the heap can become quite large, in particular, since we do not apply any subsumption to the set of FO-like clauses (this is generally done more efficiently by a first-order ATP anyway). Recent research has shown that Leibniz equality is generally very bad for automating higher-order proof search. Thus, future work in Leo includes providing support for full primitive equality and avoiding Leibniz equations.

3.4.  Soundness and completeness of the cooperation

Clearly, soundness and completeness properties depend on the corresponding properties of the systems involved, in our case, of Leo and Bliksem.

The general philosophy of Oants is to ensure the correctness of proofs by the generation of explicit proof objects, which can be checked independently from the proof generation. In particular, reasoning steps of ATPs have to be translated into Oants's natural deduction calculus via the Tramp proof transformation system [MeierCade00] to be machine-checkable. Since the cooperative proof result of Leo-Bliksem cannot yet be directly inserted into the centralised proof object, the generation of a machine-checkable proof object is not yet supported. One possible solution is to insert Bliksem proofs into Leo proofs at the right places. Then, the modified Leo proofs can be inserted into the centralised proof object, and hence, explicit proof objects can be generated by Oants. In principle, there is no problem with this, however, it is not yet implemented.

While there are many advantages in guaranteeing correctness of proofs by checking them, it is worth noting that the combination of Leo and Bliksem is sound under the assumption that the two systems are sound. Namely, to prove a theorem it is sufficient to show that a subset of clauses generated in the proof is inconsistent. If Leo generates an inconsistent set of clauses, then it does so correctly by assumption, be it a FO-like set or not. Assuming that the translation from FO-like clauses to truly first-order clauses preserves consistency/inconsistency, then a set of clauses that is given to Bliksem is inconsistent only if Leo generated an inconsistent set of clauses in the first place. By the assumption that Bliksem is sound follows that Bliksem will only generate the empty clause when the original clause set was inconsistent.

Thus, soundness of our cooperative approach critically relies only on the soundness of the selected transformational mapping from FO-like clauses to proper first-order clauses. We use the mapping from Tramp, which has been previously shown to be sound and is based on [Kerber-PhD]. Essentially, it injectively maps expressions such as P(f(a)) to expressions such as @1 pred(P,@1 fun(f,a)), where the @ are new first-order operators describing function and predicate application for particular types and arities. The injectivity of the mapping guarantees soundness, since it allows each proof step to be mapped back from first-order to higher-order. Hence, our higher-order/first-order cooperative approach between Leo and Bliksem is sound.

Completeness (in the sense of Henkin completeness) can in principle be achieved in higher-order systems, but practically, the strategies used are typically not complete for efficiency reasons. Let us assume that we use a complete strategy in Leo. All that our procedure does is pass FO-like clauses to Bliksem. Hence, no proofs can be lost in this process. That is, completeness follows trivially from the completeness of Leo.

The more interesting question is whether particular cooperation strategies will be complete as well. For instance, in Leo we may want to give higher preference to real higher-order steps which guarantee the generation of first-order clauses.

4.  Experiments and Results

We conducted several experiments to evaluate our hybrid reasoning approach. In particular, we concentrated on problems given in Table 1. We investigated several Leo strategies in order to compare Leo's individual performance with the performance of the Leo-Bliksem cooperation. Our example set differs from the one in [GanzingerStuber-03-cade] in that it contains some additional problems, and it also omits an entry for problem SET108+1. This problem addresses the universal class and can therefore not be formalised in type theory in the same concise way as the other examples, but only in a way very similar to the one given in TPTP.

Table 4 presents the results of our experiments. All timings given in the table are in seconds. The first column contains the TPTP identifier of the problem. The second column relates some of the problems to their counterparts in the Journal of Formalized Mathematics (JFM; see where they originally stem from. This eases the comparison with the results in [C3,T2], where the problems from the JFM article Boolean Properties of Sets were already solved: the problems are named with prefix `B:'. Prefix `RS1:' stands for the JFM article Relations Defined on Sets. The third column lists the TPTP (v3.0.1 as of 20 January 2005, see difficulty rating of the problem, which indicates how hard the problem is for first-order ATPs (difficulty rating 1.00 indicates that no TPTP prover can solve the problem).

The fourth, fifth and sixth columns list whether Saturate, Muscadet (v2.4) and E-Setheo (csp04), respectively, can (+) or cannot (-) solve a problem. The seventh column lists the timing results for Vampire (v7). The results for Saturate are taken from [GanzingerStuber-03-cade] (a `?' in Table 4 indicates that the result was not listed in [GanzingerStuber-03-cade] and is thus unavailable). The results for Muscadet and E-Setheo are taken from the on-line version of the solutions provided with the TPTP. Since the listed results were obtained from different experiments on different platforms, their run-time comparison would be unfair, and was thus not carried out. The timings for Vampire, on the other hand, are based on private communication with A. Voronkov and they were obtained on a computer with a very similar specification as we used for the Leo-Bliksem timings. Note, that the results for Vampire and E-Setheo reported in [GanzingerStuber-03-cade] differ for some of the problems to the ones in TPTP. This is probably due to different versions of the systems tested, for instance, the TPTP uses Vampire version 7, while the results reported in [GanzingerStuber-03-cade] are based on version 5. The results in columns four through to seven show that some problems are still very hard for first-order ATPs, as well as for the special purpose theorem prover Muscadet. Column eight and nine in Table 4 list the results for Leo alone and Leo-Bliksem, respectively. Each of these two columns is further divided into sub-columns to allow for a detailed comparison.

TPTP-   Mizar  Diffi-   Satu-   Mus   E-Se-   Vamp-  c||LEO   cLEO-Bliksem
Problem   Problem   culty   rate   cadet   theo   ire 7   Strat.   Cl.   Time   Cl.   Time   FOcl   FOtm   GnCl
-------- -------------
SET014+4     .67   +   +   +   .01   ST   41   .16   34   6.76   19   .01   7
SET017+1     .56   -   -   +   .03   EXT   3906   57.52   25   8.54   16   .01   74
SET066+1     1.00   ?   -   -   -   -   -   -   26   6.80   20   10   56
SET067+1     .56   +   +   +   .04   ST   6   .02   13   .32   16   .01   12
SET076+1     .67   +   -   +   .00   -   -   -   10   .47   18   .01   35
SET086+1     .22   +   -   +   .04  ST   4   .01   4   .01   N/A   N/A   N/A
SET096+1     .56   +   -   +   .03   -   -   -   27   7.99   14   .01   25
SET143+3   B:67   .67   +   +   +   68.71  EIR   37   .38   33   7.93   18   .01   19
SET171+3   B:71   .67   +   +   -   108.31  EIR   36   .56   25   4.75   19   .01   20
SET580+3   B:23   .44   +   +   +   14.71  EIR   25   .19   6   2.73   8   .01   13
SET601+3   B:72   .22   +   +   +   168.40  EIR   145   2.20   55   4.96   8   .01   13
SET606+3   B:77   .78   +   -   +   62.02  EIR   21   .33   17   10.8   15   .01   5
SET607+3   B:79   .67   +   +   +   65.57  EIR   22   .31   17   7.79   15   .01   6
SET609+3   B:81   .89   +   +   -   161.78  EIR   37   .60   26   6.50   19   10   17
SET611+3   B:84   .44   +   -   +   60.20  EIR   996   12.69   72   32.14   38   .01   101
SET612+3   B:85   .89   +   -   -   113.33  EIR   41   .54   18   3.95   6   .01   7
SET614+3   B:88   .67   +   +   -   157.88  EIR   38   .46   19   4.34   16   .01   17
SET615+3   B:89   .67   +   +   -   109.01  EIR   38   .57   17   3.59   6   .01   9
SET623+3   B:99   1.00   ?   -   -   -   EXT   43   8.84   23   9.54   10   .01   14
SET624+3   B:100   .67   +   -   +   .04  ST   4942   34.71   54   9.61   46   .01   212
SET630+3   B:112   .44   +   -   +   60.39  EIR   11   .07   6   .08   8   10   4
SET640+3   RS1:2   .22   +   -   +   70.41  EIR   2   .01   2   .01   N/A   N/A   N/A
SET646+3   RS1:8   .56   +   -   +   59.63  EIR   2   .01   2   .01   N/A   N/A   N/A
SET647+3   RS1:9   .56   +   -   +   64.21  EIR   26   .15   13   .30   13   .01   15
SET648+3   RS1:10   .56   +   -   +   64.22  EIR   26   .15   14   .30   13   .01   16
SET649+3   RS1:11   .33   -   -   +   63.77  EIR   45   .30   29   5.49   12   .01   16
SET651+3   RS1:13   .44   -   -   +   63.88  EIR   20   .10   11   .16   10   10   11
SET657+3   RS1:19   .67   +   -   +   1.44  EIR   2   .01   2   .01   N/A   N/A   N/A
SET669+3   RS1:19   .22   -   -   +   .34  EI   35   .22   35   .23   N/A   N/A   N/A
SET670+3   RS1:33   1.00   ?   -   -   -   EXT   15   .17   17   .36   16   .01   6
SET671+3   RS1:34   .78   -   -   +   218.02  EIR   78   .64   7   2.71   10   .01   14
SET672+3   RS1:35   1.00   ?   -   -   -   EXT   27   .4   30   .70   21   .01   11
SET673+3   RS1:36   .78   -   -   +   47.86  EIR   78   .65   14   5.66   14   .01   16
SET680+3   RS1:47   .33   +   -   +   .07  ST   185   .88   29   4.61   18   .01   24
SET683+3   RS1:50   .22   +   -   +   .06  ST   46   .20   35   8.90   18   10   24
SET684+3   RS1:51   .78   -   -   +   .33  ST   275   2.45   46   5.95   26   .01   47
SET686+3   RS1:53   .56   -   -   +   .11  ST   274   2.36   46   5.37   26   .01   46
SET716+4     .89   +   +   -   -  ST   39   .45   18   3.81   18   .01   118
SET724+4     .89   +   +   -   -  EXT   154   2.75   18   7.21   15   10   23
SET741+4     1.00   ?   -   -   -   -   -   -   -   -   -   -   -
SET747+4     .89   -   +   -   -  ST   34   .46   25   1.11   18   10   10
SET752+4     .89   ?   +   -   -   -   -   -   50   6.60   48   .01   4363
SET753+4     .89   ?   +   -   -   -   -   -   15   3.07   12   10   19
SET764+4     .56   +   +   +   .02  EI   9   .05   8   .04   N/A   N/A   N/A
SET770+4     .89   +   +   -   -   -   -   -   -   -   -   -   -
4. Experimental data for the benchmark problems given in Table 1.

All our experiments (for the values of Leo and Leo-Bliksem) were conducted on a 2.4 GHz Xenon machine with 1GB of memory and an overall time limit of 100 seconds. For our experiments with Leo alone in column eight in Table 4 we tested four different strategies. Mainly, they differ in their treatment of equality and extensionality. This ranges from immediate expansion of primitive equality with Leibniz equality and limited extensionality reasoning, STANDARD (ST), to immediate expansion of primitive equality and moderate extensionality reasoning, EXT, to delayed expansion of primitive equality and moderate extensionality reasoning, EXT-INPUT (EI), and finally to delayed expansion of primitive equality and advanced recursive extensionality reasoning, EXT-INPUT-RECURSIVE (EIR). Column eight in Table 4 presents the fastest strategy for a respective problem (Strat.), the number of clauses generated by Leo (Cl.), and the total runtime (Time). While occasionally there were more than one Leo strategy that could solve a problem, it should be noted that none of the strategies was successful for all the problems solved by Leo.

In contrast to the experiments with Leo alone, we used only the EXT-INPUT strategy for our experiments with the Leo-Bliksem cooperation. Column nine in Table 4 presents the number of clauses generated by Leo (Cl.) together with the time (Time), and in addition, the number of first-order clauses sent to Bliksem (FOcl), the time used by Bliksem (FOtm), and the number of clauses generated by Bliksem (GnCl). Note, that we give the data only for the first instance that Bliksem actually succeeded in solving the problem. This time also includes the time needed to write and process input and output files over the network. While Leo and instances of Bliksem were running in separate threads (each run of Bliksem was given a 50 second time limit), the figures given in the `Time' column reflect the overall time needed for a successful proof. That is, it contains the time needed by all concurrent processes: Leo's own process as well as those processes administering the various instances of Bliksem. Since all these processes ran on a single processor, there is potential to ameliorate the overall runtimes by using real multiprocessing.

Note also, that the number of clauses in Leo's search space is typically low since subsumption is enabled. Subsumption, however, was not enabled for the accumulation of FO-like clauses in Leo's bag of FO-like clauses. This is why there are usually more clauses in this bag (which is sent to Bliksem) than there are available in Leo's search space. Finally, observe that for some problems a refutation was found after Leo's clausal normalisation, and therefore Bliksem was not applicable (N/A).

While Leo itself can solve a majority of the considered problems with some strategy, the Leo-Bliksem cooperation can solve more problems and, moreover, needs only a single Leo strategy. We can also observe that for many problems that appear to be relatively hard for Leo alone (e.g., SET017+1, SET611+3, SET624+3), the Leo-Bliksem cooperation solves them not only more quickly, but also it sometimes reduces the problems to relatively small higher-order pre-processing steps with subsequent easy first-order proofs, as for instance, in the case of SET017+1.

From a mathematical viewpoint the investigated problems are trivial and, hence, they should ideally be reliably and very efficiently solvable within a proof assistant. This has been achieved for the examples in Table 4 (except for SET741+4 and SET770+4) by our hybrid approach. While some of the proof attempts now require slightly more time than when using Leo alone with a specialised strategy, they are, in most cases, still faster than when proving with a first-order system.

5.  Related Work and Conclusion

Related to our approach is the Techs system [DeFu99], which realises a cooperation between a set of heterogeneous first-order theorem provers. Similarly to our approach, partial results in Techs are exchanged between the different theorem provers in form of clauses. The main difference to the work of Denzinger et al. (and other related architectures like [FisIre:mapp98]) is that our system bridges between higher-order and first-order automated theorem proving. Also, unlike in Techs, we provide a declarative specification framework for modelling external systems as cooperating, concurrent processes that can be (re-)configured at run-time. Related is also the work of Hurd [hurd2002e] which realises a generic interface between HOL and first-order theorem provers. It is similar to the solution previously achieved by Tramp [MeierCade00] in Omega, which serves as a basis for the sound integration of ATPs into Oants. Both approaches pass essentially first-order clauses to first-order theorem provers and then translate their results back into HOL resp. Omega. Some further related work on the cooperation of Isabelle with Vampire is presented in [DBLP:conf/cade/MengP04]. The main difference of our work to the related systems is that while our system calls first-order provers from within higher-order proof search, this is not the case for [hurd2002e,MeierCade00,DBLP:conf/cade/MengP04].

One of the motivations for our work is to show that the cooperation of higher-order and first-order automated theorem provers can be very successful and effective. The results of our case study provide evidence for this: our non-optimised system outperforms related work on state-of-the-art first-order theorem provers and their ad hoc extensions such as Saturate [GanzingerStuber-03-cade] on 45 mathematical problems chosen from the TPTP SET category. Among them are four problems which cannot be solved by any TPTP system to date. In contrast to the first-order situation, these problems can in fact be proved in our approach reliably from first principles, that is, without avoiding relevant base axioms of the underlying set theory, and moreover, without the need to provide relevant lemmata and definitions by hand.

The results of our case study motivate further research in the automation of higher-order theorem proving and the experimentation with different higher-order to first-order transformation mappings (such as the ones used by Hurd) that support our hybrid reasoning approach. They also provide further evidence for the usefulness of the Oants approach as described in [BeSo01,BeJaKeSo01b] for flexibly modelling the cooperation of reasoning systems.

Our results also motivate the need for a higher-order extension of the TPTP library in which alternative higher-order problem formalisations are linked with their first-order counterparts so that first-order theorem provers could also be evaluated against higher-order systems (and vice versa).

Future work is to investigate how far our approach scales up to more complex problems and more advanced mathematical theories. In less trivial settings as discussed in this paper, we will face the problem of selecting and adding relevant lemmata to avoid immediate reduction to first principles and to appropriately instantiate set variables. Relevant related work for this setting is Bishop's approach to selectively expand definitions as presented in [BiAn98] and Brown's PhD thesis on set comprehension in Church's type theory [Brown04].

For advice and help we thank Chad Brown, Andreas Meier, Andrei Voronkov, and Claus-Peter Wirth.


© 2005, Springer-Verlag. LNAI 3452, p.415-431. LPAR, Christoph Benzmüller, Volker Sorge, Mateja Jamnik, Manfred Kerber
The URL of this page is