Intuitive and Formal Representations:
The Case of Matrices

Martin Pollet1,2
The author's work is supported by EU IHP grant Calculemus HPRN-CT-2000-00102.
Volker Sorge2
The author's work is supported by a Marie-Curie Grant from the European Union.
Manfred Kerber2


1 Fachbereich Informatik, Universität des Saarlandes, Germany
http://www.ags.uni-sb.de/~pollet
2 School of Computer Science, The University of Birmingham, England
http://www.cs.bham.ac.uk/~vxs, http://www.cs.bham.ac.uk/~mmk

ABSTRACT

A major obstacle for bridging the gap between textbook mathematics and formalising it on a computer is the problem how to adequately capture the intuition inherent in the mathematical notation when formalising mathematical concepts. While logic is an excellent tool to represent certain mathematical concepts it often fails to retain all the information implicitly given in the representation of some mathematical objects. In this paper we concern ourselves with matrices, whose representation can be particularly rich in implicit information. We analyse different types of matrices and present a mechanism that can represent them very close to their textbook style appearance and captures the information contained in this representation but that nevertheless allows for their compilation into a formal logical framework. This firstly allows for a more human-oriented interface and secondly enables efficient reasoning with matrices.

1.  Introduction

A big challenge for formalising mathematics on computers is still to choose a representation that is on the one hand close to that in mathematical textbooks and on the other hand sufficiently formal in order to perform formal reasoning. While there has been much work on intermediate representations via a `mathematical vernacular' [Bruijn94,WeWi02,KamareddineNederpelt2004], most of this work concentrates on representing mathematical proofs in a way that closely resembles the human reasoning style. Only little attention has been paid to an adequate representation of concrete mathematical objects, which captures all the information and intuition that comes along with their particular notation. Logicians are typically happy with the fact that such concepts can be represented in some way, whereas users of a formal system are more concerned with the question, how to represent a concept and how much effort is necessary to represent it. Depending on the purpose of the representation it is also important how easy it is to work with it.

In this paper we examine one particular type of mathematical objects, namely matrices. Let us first take a closer look how the matrix concept is introduced in a mathematical textbook. Lang [Lang84, p.441] writes:

"By an m x n matrix in R one means a doubly indexed family of elements of R, (aij), (i=1,... ,m and j=1,... ,n), usually written in the form

a11   ...   a1n
  ...  
am1   ...   amn
We call the elements aij the coefficients or components of the matrix."

Mathematicians do not work with the definition alone. The definition already introduces the representation as a rectangular form in which the elements of a matrix are ordered with respect to their indices. Matrices can be viewed as collections of row or column vectors, as block matrices, and various types of ellipses are used to describe the form of a matrix. The different representations are used to make the relevant information directly accessible and ease reasoning.

Depending on the exact logical language, one would consider a matrix as a tuple consisting of a double indexed function, number of rows, number of columns, and the underlying ring. And opposed to mathematics, one has to stick to this definition during all proofs. The (logical) view that a matrix is a tuple, which mainly bears aspects of a function, is not adequate from a mathematical point of view. If we look at a concrete matrix such as a 2 x 2 matrix containing only the zero element this matrix Z is a constant. This means in particular that for any matrix M, the product M* Z is equal to Z without the necessity to do reasoning about tuples and lambda expression. This is analogous to the relationship between the formal logical and mathematical view of the natural number four, which logically is the ground term s(s(s(s(0)))), while mathematically it is the constant symbol 4.

In this paper we show how to abstract from the functional representation of concrete matrices and how to attach structural information for matrices to the formal representation by so-called annotated constants. The structural information can be used for reasoning, which simplifies proof construction since some of the reasoning steps can be expressed as computations. The connection to the formal content allows the verification of the abstract proof steps in the underlying logical calculus.

In the next section we will have a closer look at different types of matrices that we want to be able to represent. In section 3 we will introduce annotated constants as an intermediate representation for mathematical objects. In section 4 we will discuss how concrete matrices, block matrices and ellipses can be represented and manipulated as annotated constants. We conclude with a discussion of related and future work in section 5.

2.  Matrices - Examples

In this section we give an overview of some important cases of matrices and their representations as they appear in algebra books (e.g. [Lang84,Koecher83]). We do not intend to give an exhaustive overview. However, we believe that the cases covered here will allow for generalisations and adaptations to others. We discuss some of the representational issues, for which we propose solutions in the subsequent sections of the paper.

2.1.  Concrete Matrices

A matrix is a finite set of entries arranged in a rectangular grid of rows and columns. The number of rows and columns of a matrix is often called the size of that matrix. The entries of a matrix are usually elements belonging to some algebraic field or ring. Matrices often occur in a very concrete form. That is, the exact number of rows and columns as well as the single entries are given. An example is the following 2 x 3-matrix: M=

3   2   7
1   0   4
Matrices of this form are fairly easy to represent and handle electronically. They can simply be linearised into a list of rows, which is indeed the standard input representation of matrices for most mathematical software, such as computer algebra systems. Since both the size of the matrix is determined and all the elements are given and of a specific type, manipulations of the matrix and computations with the matrix can be efficiently performed, even if the concrete numbers are replaced by indeterminates such as a,b,c.

While concrete matrices often occur in many engineering disciplines, pure mathematics goes normally beyond concrete sizes, but will speak of matrices in a more generalised fashion.

2.2.  Block matrices

Block matrices are matrices of fixed sizes, typically 2 x 2 or 3 x 3, whose elements consist of rectangular blocks of elements of not necessarily determined size. Thus, block matrices are in effect shorthand for much larger structures, whose internal format can be captured in a particular pattern of blocks. Consider, for instance, the general matrix of size (n+1) x (n+1) given as M=

a   vT
0   A
in which a =/=  0 is a ring element (scalar), 0 is the zero vector of size n, vT is the transpose of an arbitrary vector v of size n, and A is a matrix of size n x n.

A block matrix can be emulated with a concrete matrix by regarding its elements as matrices themselves. This can be achieved by identifying, scalars with 1 x 1 matrices, vectors with n x 1 matrices, and transposed vectors with 1 x n matrices. While this enables the use of the techniques available for concrete matrices to input and represent block matrices, manipulating block matrices is not as straightforward. Since the elements do no longer belong to the same algebraic ring (or indeed to any ring), computations can only be carried out with respect to a restricted set of axioms. In particular, one has to generally forgo commutativity when computing with the single blocks. Again this can be simulated to a certain extend. For instance, the inverse of the above matrix can be computed by using a computer algebra system that can deal with non-commutativity, as demonstrated in section 4.2. Computations concerning two block matrices, such as matrix addition or multiplication can be simulated as well. However care has to be taken that the sizes of the blocks are compatible.

2.3.  Ellipses

While block matrices can capture simple patterns in a matrix in an abstract way, more complex patterns in generalised matrices are usually described using ellipses. Consider for instance the definition of the following matrix A: A=

a11   b   ...   b
0  
.
   .
      .
 
.
   .
      .
 
.
.
.

.
.
.
 
.
   .
      .
 
.
   .
      .
  b
0   ...   0   ann
The representation stands for an infinite class of n x n matrices such that we have a diagonal with elements aii, 1<= i<= n, all elements below the diagonal are zero, while the elements above the diagonal are all b. A matrix of this form is usually called an upper triangle matrix.

In the context of matrices we can distinguish essentially two types of ellipses:

  1. Ellipses denoting an arbitrary but fixed number of occurrences of the same element, such as in (0... 0).
  2. Ellipses representing a finite number of similar elements that are enumerated with respect to a given index set (a1... an)Here the notion of an index set is not necessarily restricted to being only a set of indices for a family of terms, but we also use it with a more general meaning of enumerating a sequence of elements as for instance in the vector containing the first n powers of a, (a1 ... an)..
Both types of ellipses are primarily placeholders for a finite set of elements that are either directly given (1) or can be inferred given the index set (2).

Another important feature of ellipses in the context of matrices are their orientation. While for most mathematical objects, such as sets or vectors, ellipses can occur in exactly one possible orientation, in two-dimensional objects such as matrices we can distinguish three different orientations: horizontal, vertical, and diagonal. Thereby ellipses are not only placeholders for single rows, columns or diagonals but a combination of ellipses together can determine the composition of a whole area within the matrix. For example the interaction of the diagonal, horizontal and vertical ellipses between the three 0s in A determines that the whole area underneath the main diagonal defaults to 0.

Matrices with ellipses are well suited for manipulation and reasoning at an intuitive abstract level. However, already their mere representation poses some problems. While they can be linearised with some effort, a two-dimensional representation of the object is preferable, as this eases to determine the actual meaning of the occurring ellipses. It is even more challenging to mechanise abstract reasoning or abstract computations on matrices with ellipses.

2.4.  Generalised Matrices

While ellipses already provide a powerful tool to express matrices in a very general form by specifying a large number of possible patterns, one sometimes wants to be even more general than that. Consider for instance the following definition of matrix B: B=

a11   *   ...   *
0  
.
   .
      .
 
.
   .
      .
 
.
.
.

.
.
.
 
.
   .
      .
 
.
   .
      .
  *
0   ...   0   ann
. Also written as:
a11     *
 
.
   .
      .
 
0     ann
Matrix B is very similar to A above, with the one exception that the elements above the main diagonal are now arbitrary, indicated by * , rather than b. This is a further generalisation as B now describes an upper triangle matrix of variable sizes n x n, where we are only interested in the elements of the main diagonal aii, 1<= i<= n, but we don't care about the elements above the diagonal. While such a matrix can be represented with the same representational tools as the matrix A above, it will be more difficult to deal with when we actually want to compute or reason with it.

In the following we shall describe how we can handle concrete matrices, block matrices, and ellipses. In order to extend our approach to cover generalised matrices as well, we would need to handle "don't care" symbols, in addition. We do not go into this question in this paper.

3.  Intermediate Representation - Annotated Constants

In this section we present the concept of annotated constants, a mechanism that provides a representational layer that can both capture the properties of the intuitive mathematical representation of objects, as well as connect these objects to their corresponding representation in a formal logic framework. Annotated constants are implemented in the Omega system [Omega02] and therefore the logical framework is Omega's simply typed lambda calculus (cf. [Andrews02]). We have first introduced annotated constants in [PoSo03] in the context of mathematical objects, such as numbers, lists, and permutations. For the sake of clarity we explain the idea in the following using the much simpler example of finite sets.

Let us assume a logical language and a ground term t in this language. Let c be a constant symbol with c=t. An annotated constant is then a triple (c,t,a), in which a is the annotation. The annotation a is any object (making use of an arbitrary data structure) from which c and t can be reconstructed. Think of c as the name of the object, t as the representation within logic, and a as a representation of the object outside logic.

Finite Sets:
Finite sets have a special notation in the mathematical vernacular, for example, the set consisting of the three elements a, b, and c is denoted by {a,b,c}. We can define this by giving the set a name, e.g., A, and a definition in logic as a ground term. Important knowledge about sets with which it is appropriate to reason efficiently is: sets are equal if they contain the same elements regardless of their order, or the union of two sets consists of the elements which are a member of one of the sets and so on. This type of set manipulation has not so much to do with logical reasoning as it has with computation. The union of two sets, for instance, can be very efficiently computed and should not be part of the process of search for a proof.

Annotated constants for finite sets are defined with the attributes

The basic functionality for handling annotated constants is implemented on the term level of the Omega system. In first approximation, an annotated constant is a constant with a definition and has the type of its defining term t. As such it could be replaced by its defining term during the proof or when expanding the proof to check formal correctness. Typically, this is not done, but annotated constants are manipulated via their annotations. The defining term of an annotated term is used only when necessary.

The manipulation of operations and verification of properties is realised as procedural annotations to functions and predicates. A procedural annotation is a triple (f,p,T), where f is a function or predicate of the logical language, p is a procedure of the underlying programming language with the same number of arguments as f and T is a specification (or tactic) for the construction of a formal proof for the manipulation performed by p. The procedure p checks its arguments, performs the simplification, and returns a simplified constant or term together with possible conditions for this operation.

For example, the procedure for the union of concrete sets {a,b} union {c,d} checks whether the arguments are annotated constants for concrete sets, and returns the annotated constant which has the concatenation of {a,b} and {c,d} as annotation. Analogously the property {1,2,3} SUBSET Z holds, when all elements of the annotation of the set are constants which have as annotation an integer as data structure.

The proof specification T is used to formally justify the performed step. Thereby an annotated constant is expanded to its formal definition and the computation is reconstructed by tactic and theorem applications. This expansion will be done only when a low level formal proof is required, certainly not during proof search.

What are the advantages of using annotated constants?

Firstly, annotated constants provide an intermediate representation layer between the intuitive mathematical vernacular and a formal system. With annotated constants it is possible to abstract from the formal introduction of objects, allow the identification of certain classes of objects and enable the access of relevant knowledge about an object directly. Annotations can be translated into full formal logic expressions when necessary, but make it possible to work and reason with mathematical objects in a style that abstracts from the formal construction.

Secondly, annotations allow for user friendly input and output facilities. We extended Omega's input language to provide a markup for an annotated constant to indicate the type of the object it represents. For each kind of annotated constant the term parser is extended by an additional function, which parses annotations and transforms these annotations into an internal representation. During parsing additional properties can be checked and errors in the specification can be detected. In this way it is possible to extend syntactic type checking. An additional output function for each kind of annotated constant allows to have different display forms for presenting formulas to the user.

Thirdly, procedural annotations enable an efficient manipulation of annotated constants. Theses procedures can access information without further analysis on (lambda) terms (which define annotated constants formally) and allows to compute standard functions and relations very efficiently. These operations and properties become a computation on the data structures of annotated constants.

4.  Matrices as Annotated Constants

In this section we show how annotated constants can be used to implement the different representations for matrices presented in section 2.

4.1.  Concrete Matrices

Concrete matrices of fixed sizes, for example, M=

3   2   7
1   0   4
can be represented in higher-order logic as a 4-tuple: (f,2,4,Z) where f is the lambda expressionThe conditional IF P THEN m ELSE n can be defined in higher-order logic by the description operator i . The expression i y. S(y) denotes the unique element c such that S(c) holds, if such a unique element exists. A conditional can thus be defined by i k . (P  &  (k = m))  |  (NOT  P  &  (k = n)), which returns m if P holds, else n (for more details see [Andrews02]).
lambda i lambda j.   IF   i=1  &  j=1   THEN 3
  ELSEIF   i=1  &  j=2   THEN 2
  ELSEIF   i=1  &  j=3   THEN 7
  ELSEIF   i=2  &  j=1   THEN 1
  ELSEIF   i=2  &  j=2   THEN 0
  ELSE     4.

When we look at the concrete matrix the information connected to this representation is, that the position of all the elements is specified and that the number of rows and columns of the matrix are immediately perceivable. When we look at the formal representation, then even to access an element at a certain position requires reasoning, even non-trivial reasoning when, for example, the first condition is given as f lambda i lambda j . IF P(i,j) THEN 3 ELSEIF ... where P(i,j) is some equivalent formulation of i=1  &  j=1.

Also in order to multiply two concrete matrices, reasoning about the corresponding lambda expressions is necessary. For instance, the transpose of M is

MT=

3   1
2   0
7   4
which can be represented as a 4-tuple (fT,4,2,Z) where fT is the function you get by swapping the arguments of f, i.e. lambda j lambda i rather than lambda i lambda j. The product M * MT is a matrix (f * 4 fT,2,2,Z), in which the function is computed component wise as the sum of products of ring elements, that is, lambda i lambda j . IF i=1  &  j=1 THEN 3* 3 + 2* 2 + 7* 7 ELSEIF ... , which requires considerable reasoning to arrive at the result. We argue that although this can be done in logic, it is not appropriate, analogously as it is not appropriate to compute a product such as 20* 15 by reasoning with the definition of * in the constructors 0 and s over the natural numbers.

We therefore define annotated constants for concrete matrices as follows:

For annotated constants representing concrete matrices the operations for summation and multiplication of matrices, scalar multiplication, and transposing a matrix are annotated by corresponding procedures. With the tactic simplify which applies all possible simplifications specified in annotated procedures, a proof step performing the matrix multiplication M * MT is:

p4cmp3.3cm
L1.   {0,1,2,3,4,7} in R   (open)
L2.   Ring(R,+,*)   (open)
L3.   P=
62   31
31   17
  (open)
L4.   P=
3   2   7
1   0   4
*
3   2   7
1   0   4
T
  (simplify L1,L2,L3)

The line L3 contains the matrix which is the result of the simplification. Since the matrices consist of integers, which are annotated constants again, the simplification can compute the result of arithmetic operations on integers. The lines L1 and L2 contain the side conditions of the computation. Note that the necessary conditions on the size of the matrices Matrix (

3   2   7
1   0   4
,2,3,R) and Matrix (
3   2   7
1   0   4
T,3,2,R) are available from the annotation and thus can be checked during the expansion of the tactic simplify.

4.2.  Block matrices

A block matrix of the form M=

a   vT
0   A
can be formally expressed in logic as a tuple (f,n+1,n+1,F), in which f is a function from the indices into the ring, {1,... ,n+1} x {1,... ,n+1} ->  F defined as
lambda i lambda j.   IF   i=1  &  j=1   THEN a
  ELSEIF   i=1 &  2<= j<= n+1   THEN vj-1
  ELSEIF   2<= i<= n+1  &  j=1   THEN 0
  ELSE     ai-1,j-1
If we assume a =/=  0 and det(A) =/=  0, we can then show that the set of matrices of the above form constitute a subgroup of the group of invertible matrices with respect to matrix multiplication. This is, however, not straightforward using the lambda expression, since a lot of the structural information is needed for the argument, which is lost in the lambda term. What we really want to do is to lift the argument about 2 x 2 block matrices to a sound argument about general matrices.

In effect the argument can be collapsed into a single computation. By emulating the block matrix with a 2 x 2 matrix over a non-commutative ring we can compute its inverse using the computer algebra system Mathematica [Wolfram03] together with the NCAlgebra package [Ncalgebra] for non-commutative algebra. If we replace the block matrix with a matrix of the form

a   b
0   c
, the corresponding inverse matrix is
a-1   -a-1* b* c-1
0   c-1
Note that -a-1* b* c-1 can not be further simplified to -b / a* c, since matrix multiplication is non-commutative. This computation on concrete matrices can be used to determine the inverse of the original block matrix by simply substituting vT for b and A for c: c
a-1   -a-1* vT* A-1
0   A-1
With the additional fact that a-1 and A-1 exist if and only if a =/=  0 and det(A) =/=  0, the property can be proved.

Block matrices are implemented as annotated constants in the following way:

The annotated constants for block matrices allow us to combine matrices given by lambda expressions with concrete matrices. The operations for matrices then work directly on the individual blocks of the block matrices. Given double indexed functions uij,vij,Aij,Bij the tactic simplify applied to the formula in L8 results in the following proof situation:

p5cmp3.3cm
L1.   Matrix (uij,1,2,R)   (open)
L2.   Matrix (vij,1,2,R)   (open)
L3.   Matrix (Bij,2,2,R)   (open)
L4.   Matrix (Aij,2,2,R)   (open)
L5.   {0,1} in R   (open)
L6.   Ring(R,+,*)   (open)
L7.   M=
(1)   uij + (vij * Bij)
0
0
  Aij * Bij
  ( open)
L8.   M=
(1)   vij

0
0
  Aij
*
(1)   uij
0
0
  Bij
  ( simplify L1,... ,L7)

Line L7 contains the result of the matrix multiplication and lines L1 to L6 the side conditions on the objects involved, which cannot be inferred from the annotations.

We describe the simplification stepwise. First the sub-blocks of the matrices are multiplied, resulting in ] Already at this point the side conditions regarding the double indexed functions uij,vij,Aij,Bij are generated. The requirements for the size can be reconstructed from the sizes of the concrete matrices together with the condition from the matrix multiplication. Then simplification is applied to the content of each sub-block, starting with simplification of the arguments of an operation.

For concrete matrices operations are performed as described in section 4.1. For the simplification of operations containing both concrete matrices and double indexed functions we only consider the following cases:

Simplifying the above block matrix with respect to the rules for multiplication then yields ] Further simplification employs the rules for addition and also scalar multiplication on 1* uij resulting in the formula given in L7 of the above proof.

The simplification for operations on matrices with mixed representations could also be carried out differently, namely by introducing concrete matrices, that is ] and then apply simplification on the elements of the matrix. For a=b=0 the result will be the same as for our simplification, but in the general case, the result is a concrete matrix having elements of the double indexed mixed with the elements of the concrete matrix. This means the structure of the initial blocks would be lost or hard to recognise.

While our example works with 3 x 3 matrices represented as 2 x 2 block matrices, the argument can be extended to arbitrary n x n matrices still represented as 2 x 2 block matrices of the form:

(1)   uij
0

.
.
.
0
  Aij
But in order to do this we need an adequate treatment of ellipses.

4.3.  Ellipses

While block matrices already allow us to combine concrete matrices using arbitrary double indexed functions, they only enable us to combine rectangular shapes. Using ellipses we can further generalise the representation of matrices. We then need to generalise also the simplifications introduced in the last section to matrices with fixed but arbitrary sizes. If we consider our example matrix

A=

a11   b   ...   b
0  
.
   .
      .
 
.
   .
      .
 
.
.
.

.
.
.
 
.
   .
      .
 
.
   .
      .
  b
0   ...   0   ann
then A can be represented in higher-order logic as a 4-tuple: (f,n,n,Z) where f corresponds to the lambda expression:
lambda i lambda j.   IF   i=j   THEN aij
  IF   i<j   THEN b
  ELSE     0

Compared to the concrete instances above, the higher-order representation is concise. Nevertheless, in mathematics one develops particular methods for reasoning with matrices of non-concrete sizes, which follow a particular pattern, such as triangle matrices, diagonal matrices, and step matrices. Since these patterns are not necessarily obvious given the lambda term alone it is desirable to have the explicit representation of matrices with ellipses available for reasoning.

Ellipses are realised using annotated constants as well. They are categorised into horizontal, vertical, and diagonal ellipses and have the following four attributes that connect them within the matrix and determine their meaning:

The values for the attributes are determined during parsing of the expression. Thereby not all combinations of ellipses are permitted. Essentially, we distinguish three basic modules a matrix can consists of:

  1. points, i.e. single concrete elements.
  2. lines, i.e. an ellipsis or a sequence of ellipses of the same form together with concrete elements as start and end points. An example of a line comprised of more than one ellipsis is for instance the main diagonal of A where two diagonal ellipses constitute a line from a11 to ann.
  3. triangles, i.e. a combination of a horizontal, a vertical and a diagonal line. Since we only allow for one type of diagonal ellipsis, the we can get exactly two different types of isosceles right triangles:

    *   ...   *
     
    .
       .
          .
     
    .
    .
    .
        *


    *    

    .
    .
    .
     
    .
       .
          .
    *   ...   *
     

Both start and end elements of an ellipsis are determined by searching for a concrete element in the respective direction (i.e., left/right, up/down, etc.) while ignoring other ellipses. Both element and range are computed given the start and end: If the start and end terms are the same then this term is taken to be the element the ellipsis represents and no range needs to be computed. In case they are not the same we try to compute a schematic term using unification. Although the unification fails it will provide us with a disagreement set on the two terms, which can be used to determine the position of possible schematic variables. If the disagreement set is sensible, that is, it consists only of terms representing integers, the schematic term is constructed and the ranges for the schematic variables are computed.

We illustrate how exactly ranges are computed with the help of some examples. Consider the vector (a1n ... an1), the schematic term is then axichi and the ranges are xi in {1,... ,n} and chi in {n,... ,1}. Since these ranges are both over the integer and compatible, in the sense that they are of the same length, the ellipsis is fully determined. As an example of incompatible ranges consider the vector (a1k ... an1); without further information on n and k the computation of the ellipsis will fail. Currently the increment of the range sets is always assumed to be 1. The computation of possible index sets is currently more a pragmatic one and rather simple. It is definitely not complete since there are many more possible uses of indices conceivable in our context.

An ellipsis is said to be computable if we can determine both begin and end element, if the element is either a concrete element or a schematic term, and if sensible and compatible integer ranges can be computed. Otherwise parsing of an ellipsis will fail. An ellipsis within a matrix gets the same type as the elements of that matrix. This means ellipses are generally treated as ordinary terms of the matrix, in particular with respect to input, display, and internal representation. For instance, our example matrix A is input as the 4 x 4 matrix

((a(1,1)  b      hdots  b     ) 
 (0       ddots  ddots  vdots )
 (vdots   ddots  ddots  b     ) 
 (0       hdots  0      a(n,n)))
and is also represented internally as a 4 x 4 array. However, the simplifications can use the information provided by the ellipsis during the reasoning process.

When a matrix containing ellipses is expanded into a lambda term the expansion algorithm translates the ellipses into appropriate conditions for the if-then-else statements. Thereby the matrix is first scanned and broken down into its components, i.e. points, lines, and triangles. These can then be associated with corresponding index sets and translated into a lambda expression. For instance the diagonal ellipsis in our example matrix A above can be simply translated into the conditional IF i=j THEN aij, while the areas above and below the main diagonal where a horizontal, a vertical, and a diagonal ellipsis bound the area in which all the elements are either 0 or b. In an additional optimisation step neighbouring triangles are compared and can be combined to form rectangular areas.

The simplification for operations on matrices are extended to the cases where matrices contain ellipses. For example, the sum of matrices where both matrices contain ellipses at the same positions results in a matrix containing the sum of concrete elements and the ellipses between those elements. The multiplication of a diagonal matrix containing the same element on the diagonal is reduced to scalar multiplication with this element.

5.  Conclusions

Formal representations of mathematical objects often do not model all important aspects of that object. Especially some of the structural properties may be lost or hard to recognise and reconstruct. In our work we investigated these structural properties for the case of matrices where there exist different representations for different purposes. Each representation has certain reasoning techniques attributed to it.

We modelled the structural knowledge about concepts with the help of annotations, which are used to identify objects and to store information about them. We implemented the different representations for matrices as annotated constants and showed how basic simplifications are performed. The representations for block matrices and ellipses allow us to represent matrices of a general form. Annotations are also used for manipulations of objects. Instead of deduction on formulas, many manipulations can be reduced to computations on annotations. Since we are able to express general matrices, we can express general properties and theorems based on our formalism. With simplifications performed on generalised matrices we are now able to express complex reasoning in the form of computational steps. In future work we want to investigate how this can further aid in the construction of actual proofs. Remember that we currently deal annotated constants, that is, only with ground terms. Thus, it would be useful to extend the work in a way that allows also to deal with variables.

Annotations preserve the correctness by their implementation as constants of the formal language. The proof construction is split into a phase where steps are performed based on the richer knowledge contained in annotations and a verification phase where these steps are expanded to calculus level proofs. The expansion is currently only implemented for a subset of the operations. The expansion mechanism for proof steps using annotations needs to be simplified and generalised. The use of canonical forms should help keeping this expansion simple.

Our work compares to de Bruijn's idea of a mathematical vernacular [Bruijn94], which should allow to write everything mathematicians do in informal reasoning, in a computer assisted system as well. In this tradition, Elbers looked in [Elbers98] at aspects of connecting informal and formal reasoning, in particular the integration of computations into formal proofs. Kamareddine and Nederpelt [KamareddineNederpelt2004] have formalised de Bruijn's idea further. While the approach to a mathematical vernacular is general, to our knowledge no attempt has been made to incorporate concrete objects like matrices directly. In the Theorema system Kutsia [kutsia02unification] has worked with sequence variables which stand for symbols of flexible arity. Sequence variables have some similarities to our ellipses. However, as opposed to sequence variables our ellipses allow only fixed interpretations. Moreover sequence variables can be viewed as an extension of the logical system which allows to deal with these expressions within the logic. The main emphasis of our work is to allow for representation within logic and extra-logical manipulation of expressions at the same time. Bundy and Richardson [BundyRichardson99] introduced a general treatment for reasoning about lists with ellipses in a way that they consider an ellipsis as a schema which stands for infinitely many expressions and a proof about ellipses stands for infinitely many proofs, which can be generated from a meta-argument.

References


© 2004, Springer-Verlag. LNCS 3119. MKM, Martin Pollet, Volker Sorge, Manfred Kerber
The URL of this page is http://www.cs.bham.ac.uk/~mmk/papers/04-MKM.html.
Also available as pdf ps.gz bib .