ABSTRACT
A major obstacle for bridging the gap between textbook mathematics and formalising it on a computer is the problem how to adequately capture the intuition inherent in the mathematical notation when formalising mathematical concepts. While logic is an excellent tool to represent certain mathematical concepts it often fails to retain all the information implicitly given in the representation of some mathematical objects. In this paper we concern ourselves with matrices, whose representation can be particularly rich in implicit information. We analyse different types of matrices and present a mechanism that can represent them very close to their textbook style appearance and captures the information contained in this representation but that nevertheless allows for their compilation into a formal logical framework. This firstly allows for a more humanoriented interface and secondly enables efficient reasoning with matrices.
A big challenge for formalising mathematics on computers is still to choose a representation that is on the one hand close to that in mathematical textbooks and on the other hand sufficiently formal in order to perform formal reasoning. While there has been much work on intermediate representations via a `mathematical vernacular' [Bruijn94,WeWi02,KamareddineNederpelt2004], most of this work concentrates on representing mathematical proofs in a way that closely resembles the human reasoning style. Only little attention has been paid to an adequate representation of concrete mathematical objects, which captures all the information and intuition that comes along with their particular notation. Logicians are typically happy with the fact that such concepts can be represented in some way, whereas users of a formal system are more concerned with the question, how to represent a concept and how much effort is necessary to represent it. Depending on the purpose of the representation it is also important how easy it is to work with it.
In this paper we examine one particular type of mathematical objects, namely matrices. Let us first take a closer look how the matrix concept is introduced in a mathematical textbook. Lang [Lang84, p.441] writes:
"By an m x n matrix in R one means a doubly indexed family of elements of R, (a_{ij}), (i=1,... ,m and j=1,... ,n), usually written in the form
a_{11}  ...  a_{1n} 
...  
a_{m1}  ...  a_{mn} 
Mathematicians do not work with the definition alone. The definition already introduces the representation as a rectangular form in which the elements of a matrix are ordered with respect to their indices. Matrices can be viewed as collections of row or column vectors, as block matrices, and various types of ellipses are used to describe the form of a matrix. The different representations are used to make the relevant information directly accessible and ease reasoning.
Depending on the exact logical language, one would consider a matrix as a tuple consisting of a double indexed function, number of rows, number of columns, and the underlying ring. And opposed to mathematics, one has to stick to this definition during all proofs. The (logical) view that a matrix is a tuple, which mainly bears aspects of a function, is not adequate from a mathematical point of view. If we look at a concrete matrix such as a 2 x 2 matrix containing only the zero element this matrix Z is a constant. This means in particular that for any matrix M, the product M* Z is equal to Z without the necessity to do reasoning about tuples and lambda expression. This is analogous to the relationship between the formal logical and mathematical view of the natural number four, which logically is the ground term s(s(s(s(0)))), while mathematically it is the constant symbol 4.
In this paper we show how to abstract from the functional representation of concrete matrices and how to attach structural information for matrices to the formal representation by socalled annotated constants. The structural information can be used for reasoning, which simplifies proof construction since some of the reasoning steps can be expressed as computations. The connection to the formal content allows the verification of the abstract proof steps in the underlying logical calculus.
In the next section we will have a closer look at different types of matrices that we want to be able to represent. In section 3 we will introduce annotated constants as an intermediate representation for mathematical objects. In section 4 we will discuss how concrete matrices, block matrices and ellipses can be represented and manipulated as annotated constants. We conclude with a discussion of related and future work in section 5.
In this section we give an overview of some important cases of matrices and their representations as they appear in algebra books (e.g. [Lang84,Koecher83]). We do not intend to give an exhaustive overview. However, we believe that the cases covered here will allow for generalisations and adaptations to others. We discuss some of the representational issues, for which we propose solutions in the subsequent sections of the paper.
A matrix is a finite set of entries arranged in a rectangular grid of rows and columns. The number of rows and columns of a matrix is often called the size of that matrix. The entries of a matrix are usually elements belonging to some algebraic field or ring. Matrices often occur in a very concrete form. That is, the exact number of rows and columns as well as the single entries are given. An example is the following 2 x 3matrix: M=
3  2  7 
1  0  4 
While concrete matrices often occur in many engineering disciplines, pure mathematics goes normally beyond concrete sizes, but will speak of matrices in a more generalised fashion.
Block matrices are matrices of fixed sizes, typically 2 x 2 or 3 x 3, whose elements consist of rectangular blocks of elements of not necessarily determined size. Thus, block matrices are in effect shorthand for much larger structures, whose internal format can be captured in a particular pattern of blocks. Consider, for instance, the general matrix of size (n+1) x (n+1) given as M=
a  v^{T} 
0  A 
A block matrix can be emulated with a concrete matrix by regarding its elements as matrices themselves. This can be achieved by identifying, scalars with 1 x 1 matrices, vectors with n x 1 matrices, and transposed vectors with 1 x n matrices. While this enables the use of the techniques available for concrete matrices to input and represent block matrices, manipulating block matrices is not as straightforward. Since the elements do no longer belong to the same algebraic ring (or indeed to any ring), computations can only be carried out with respect to a restricted set of axioms. In particular, one has to generally forgo commutativity when computing with the single blocks. Again this can be simulated to a certain extend. For instance, the inverse of the above matrix can be computed by using a computer algebra system that can deal with noncommutativity, as demonstrated in section 4.2. Computations concerning two block matrices, such as matrix addition or multiplication can be simulated as well. However care has to be taken that the sizes of the blocks are compatible.
While block matrices can capture simple patterns in a matrix in an abstract way, more complex patterns in generalised matrices are usually described using ellipses. Consider for instance the definition of the following matrix A: A=
a_{11}  b  ...  b 
0  . . . 
. . . 
. . . 
. . . 
. . . 
. . . 
b 
0  ...  0  a_{nn} 
In the context of matrices we can distinguish essentially two types of ellipses:
Another important feature of ellipses in the context of matrices are their orientation. While for most mathematical objects, such as sets or vectors, ellipses can occur in exactly one possible orientation, in twodimensional objects such as matrices we can distinguish three different orientations: horizontal, vertical, and diagonal. Thereby ellipses are not only placeholders for single rows, columns or diagonals but a combination of ellipses together can determine the composition of a whole area within the matrix. For example the interaction of the diagonal, horizontal and vertical ellipses between the three 0s in A determines that the whole area underneath the main diagonal defaults to 0.
Matrices with ellipses are well suited for manipulation and reasoning at an intuitive abstract level. However, already their mere representation poses some problems. While they can be linearised with some effort, a twodimensional representation of the object is preferable, as this eases to determine the actual meaning of the occurring ellipses. It is even more challenging to mechanise abstract reasoning or abstract computations on matrices with ellipses.
While ellipses already provide a powerful tool to express matrices in a very general form by specifying a large number of possible patterns, one sometimes wants to be even more general than that. Consider for instance the following definition of matrix B: B=
a_{11}  *  ...  * 
0  . . . 
. . . 
. . . 
. . . 
. . . 
. . . 
* 
0  ...  0  a_{nn} 
a_{11}  *  
. . . 

0  a_{nn} 
In the following we shall describe how we can handle concrete matrices, block matrices, and ellipses. In order to extend our approach to cover generalised matrices as well, we would need to handle "don't care" symbols, in addition. We do not go into this question in this paper.
In this section we present the concept of annotated constants, a mechanism that provides a representational layer that can both capture the properties of the intuitive mathematical representation of objects, as well as connect these objects to their corresponding representation in a formal logic framework. Annotated constants are implemented in the Omega system [Omega02] and therefore the logical framework is Omega's simply typed lambda calculus (cf. [Andrews02]). We have first introduced annotated constants in [PoSo03] in the context of mathematical objects, such as numbers, lists, and permutations. For the sake of clarity we explain the idea in the following using the much simpler example of finite sets.
Let us assume a logical language and a ground term t in this language. Let c be a constant symbol with c=t. An annotated constant is then a triple (c,t,a), in which a is the annotation. The annotation a is any object (making use of an arbitrary data structure) from which c and t can be reconstructed. Think of c as the name of the object, t as the representation within logic, and a as a representation of the object outside logic.
Annotated constants for finite sets are defined with the attributes
The basic functionality for handling annotated constants is implemented on the term level of the Omega system. In first approximation, an annotated constant is a constant with a definition and has the type of its defining term t. As such it could be replaced by its defining term during the proof or when expanding the proof to check formal correctness. Typically, this is not done, but annotated constants are manipulated via their annotations. The defining term of an annotated term is used only when necessary.
The manipulation of operations and verification of properties is realised as procedural annotations to functions and predicates. A procedural annotation is a triple (f,p,T), where f is a function or predicate of the logical language, p is a procedure of the underlying programming language with the same number of arguments as f and T is a specification (or tactic) for the construction of a formal proof for the manipulation performed by p. The procedure p checks its arguments, performs the simplification, and returns a simplified constant or term together with possible conditions for this operation.
For example, the procedure for the union of concrete sets {a,b} union {c,d} checks whether the arguments are annotated constants for concrete sets, and returns the annotated constant which has the concatenation of {a,b} and {c,d} as annotation. Analogously the property {1,2,3} SUBSET Z holds, when all elements of the annotation of the set are constants which have as annotation an integer as data structure.
The proof specification T is used to formally justify the performed step.
Thereby an annotated constant is expanded to its formal definition and the
computation is reconstructed by tactic and theorem applications. This expansion
will be done only when a low level formal proof is required, certainly not
during proof search.
What are the advantages of using annotated constants?
Firstly, annotated constants provide an intermediate representation layer between the intuitive mathematical vernacular and a formal system. With annotated constants it is possible to abstract from the formal introduction of objects, allow the identification of certain classes of objects and enable the access of relevant knowledge about an object directly. Annotations can be translated into full formal logic expressions when necessary, but make it possible to work and reason with mathematical objects in a style that abstracts from the formal construction.
Secondly, annotations allow for user friendly input and output facilities. We extended Omega's input language to provide a markup for an annotated constant to indicate the type of the object it represents. For each kind of annotated constant the term parser is extended by an additional function, which parses annotations and transforms these annotations into an internal representation. During parsing additional properties can be checked and errors in the specification can be detected. In this way it is possible to extend syntactic type checking. An additional output function for each kind of annotated constant allows to have different display forms for presenting formulas to the user.
Thirdly, procedural annotations enable an efficient manipulation of annotated constants. Theses procedures can access information without further analysis on (lambda) terms (which define annotated constants formally) and allows to compute standard functions and relations very efficiently. These operations and properties become a computation on the data structures of annotated constants.
In this section we show how annotated constants can be used to implement the different representations for matrices presented in section 2.
Concrete matrices of fixed sizes, for example, M=
3  2  7 
1  0  4 
lambda i lambda j.  IF  i=1 & j=1  THEN 3 
ELSEIF  i=1 & j=2  THEN 2  
ELSEIF  i=1 & j=3  THEN 7  
ELSEIF  i=2 & j=1  THEN 1  
ELSEIF  i=2 & j=2  THEN 0  
ELSE  4. 
When we look at the concrete matrix the information connected to this representation is, that the position of all the elements is specified and that the number of rows and columns of the matrix are immediately perceivable. When we look at the formal representation, then even to access an element at a certain position requires reasoning, even nontrivial reasoning when, for example, the first condition is given as f lambda i lambda j . IF P(i,j) THEN 3 ELSEIF ... where P(i,j) is some equivalent formulation of i=1 & j=1.
Also in order to multiply two concrete matrices, reasoning about the corresponding lambda expressions is necessary. For instance, the transpose of M is
M^{T}=
3  1 
2  0 
7  4 
We therefore define annotated constants for concrete matrices as follows:
For annotated constants representing concrete matrices the operations for summation and multiplication of matrices, scalar multiplication, and transposing a matrix are annotated by corresponding procedures. With the tactic simplify which applies all possible simplifications specified in annotated procedures, a proof step performing the matrix multiplication M * M^{T} is:
p4cmp3.3cm
L_{1}.  {0,1,2,3,4,7} in R  (open)  
L_{2}.  Ring(R,+,*)  (open)  
L_{3}.  P=

(open)  
L_{4}.  P=

(simplify L_{1},L_{2},L_{3}) 
The line L_{3} contains the matrix which is the result of the simplification. Since the matrices consist of integers, which are annotated constants again, the simplification can compute the result of arithmetic operations on integers. The lines L_{1} and L_{2} contain the side conditions of the computation. Note that the necessary conditions on the size of the matrices Matrix (
3  2  7 
1  0  4 
3  2  7 
1  0  4 
A block matrix of the form M=
a  v^{T} 
0  A 
lambda i lambda j.  IF  i=1 & j=1  THEN a 
ELSEIF  i=1 & 2<= j<= n+1  THEN v_{j1}  
ELSEIF  2<= i<= n+1 & j=1  THEN 0  
ELSE  a_{i1,j1} 
In effect the argument can be collapsed into a single computation. By emulating the block matrix with a 2 x 2 matrix over a noncommutative ring we can compute its inverse using the computer algebra system Mathematica [Wolfram03] together with the NCAlgebra package [Ncalgebra] for noncommutative algebra. If we replace the block matrix with a matrix of the form
a  b 
0  c 
a^{1}  a^{1}* b* c^{1} 
0  c^{1} 
a^{1}  a^{1}* v^{T*} A^{1} 
0  A^{1} 
Block matrices are implemented as annotated constants in the following way:
A  B 
C  D 
lambda i lambda j.  IF  1<=q i <=q r_{1}  1 <=q j <=q c_{1}  THEN A(i,j) 
ELSEIF  1<=q i <=q r_{1}  c_{1+1} <=q j  THEN B(i,jc_{1})  
ELSEIF  r_{1+1<=q} i  1 <=q j <=q c_{1}  THEN C(ir_{1},j)  
ELSE  [i.e.,r_{1+1<=q} i  c_{1+1} <=q j]  D(ir_{1},jc_{1}), 
The annotated constants for block matrices allow us to combine matrices given by lambda expressions with concrete matrices. The operations for matrices then work directly on the individual blocks of the block matrices. Given double indexed functions u_{ij},v_{ij},A_{ij},B_{ij} the tactic simplify applied to the formula in L_{8} results in the following proof situation:
p5cmp3.3cm
L_{1}.  Matrix (u_{ij},1,2,R)  (open)  
L_{2}.  Matrix (v_{ij},1,2,R)  (open)  
L_{3}.  Matrix (B_{ij},2,2,R)  (open)  
L_{4}.  Matrix (A_{ij},2,2,R)  (open)  
L_{5}.  {0,1} in R  (open)  
L_{6}.  Ring(R,+,*)  (open)  
L_{7}.  M=

( open)  
L_{8}. 
M=

( simplify L_{1},... ,L_{7}) 
Line L_{7} contains the result of the matrix multiplication and lines L_{1} to L_{6} the side conditions on the objects involved, which cannot be inferred from the annotations.
We describe the simplification stepwise. First the subblocks of the matrices are multiplied, resulting in ] Already at this point the side conditions regarding the double indexed functions u_{ij},v_{ij},A_{ij},B_{ij} are generated. The requirements for the size can be reconstructed from the sizes of the concrete matrices together with the condition from the matrix multiplication. Then simplification is applied to the content of each subblock, starting with simplification of the arguments of an operation.
For concrete matrices operations are performed as described in section 4.1. For the simplification of operations containing both concrete matrices and double indexed functions we only consider the following cases:
The simplification for operations on matrices with mixed representations could also be carried out differently, namely by introducing concrete matrices, that is ] and then apply simplification on the elements of the matrix. For a=b=0 the result will be the same as for our simplification, but in the general case, the result is a concrete matrix having elements of the double indexed mixed with the elements of the concrete matrix. This means the structure of the initial blocks would be lost or hard to recognise.
While our example works with 3 x 3 matrices represented as 2 x 2 block matrices, the argument can be extended to arbitrary n x n matrices still represented as 2 x 2 block matrices of the form:
(1)  u_{ij}  

A_{ij} 
While block matrices already allow us to combine concrete matrices using arbitrary double indexed functions, they only enable us to combine rectangular shapes. Using ellipses we can further generalise the representation of matrices. We then need to generalise also the simplifications introduced in the last section to matrices with fixed but arbitrary sizes. If we consider our example matrix
A=
a_{11}  b  ...  b 
0  . . . 
. . . 
. . . 
. . . 
. . . 
. . . 
b 
0  ...  0  a_{nn} 
lambda i lambda j.  IF  i=j  THEN a_{ij} 
IF  i<j  THEN b  
ELSE  0 
Compared to the concrete instances above, the higherorder representation is concise. Nevertheless, in mathematics one develops particular methods for reasoning with matrices of nonconcrete sizes, which follow a particular pattern, such as triangle matrices, diagonal matrices, and step matrices. Since these patterns are not necessarily obvious given the lambda term alone it is desirable to have the explicit representation of matrices with ellipses available for reasoning.
Ellipses are realised using annotated constants as well. They are categorised into horizontal, vertical, and diagonal ellipses and have the following four attributes that connect them within the matrix and determine their meaning:
The values for the attributes are determined during parsing of the expression. Thereby not all combinations of ellipses are permitted. Essentially, we distinguish three basic modules a matrix can consists of:
*  ...  * 
. . . 
. . .  
* 
*  
. . . 
. . .  
*  ...  * 
Both start and end elements of an ellipsis are determined by searching for a concrete element in the respective direction (i.e., left/right, up/down, etc.) while ignoring other ellipses. Both element and range are computed given the start and end: If the start and end terms are the same then this term is taken to be the element the ellipsis represents and no range needs to be computed. In case they are not the same we try to compute a schematic term using unification. Although the unification fails it will provide us with a disagreement set on the two terms, which can be used to determine the position of possible schematic variables. If the disagreement set is sensible, that is, it consists only of terms representing integers, the schematic term is constructed and the ranges for the schematic variables are computed.
We illustrate how exactly ranges are computed with the help of some examples. Consider the vector (a_{1}^{n} ... a_{n}^{1}), the schematic term is then a_{xi}^{chi} and the ranges are xi in {1,... ,n} and chi in {n,... ,1}. Since these ranges are both over the integer and compatible, in the sense that they are of the same length, the ellipsis is fully determined. As an example of incompatible ranges consider the vector (a_{1}^{k} ... a_{n}^{1}); without further information on n and k the computation of the ellipsis will fail. Currently the increment of the range sets is always assumed to be 1. The computation of possible index sets is currently more a pragmatic one and rather simple. It is definitely not complete since there are many more possible uses of indices conceivable in our context.
An ellipsis is said to be computable if we can determine both begin and end element, if the element is either a concrete element or a schematic term, and if sensible and compatible integer ranges can be computed. Otherwise parsing of an ellipsis will fail. An ellipsis within a matrix gets the same type as the elements of that matrix. This means ellipses are generally treated as ordinary terms of the matrix, in particular with respect to input, display, and internal representation. For instance, our example matrix A is input as the 4 x 4 matrix
((a(1,1) b hdots b ) (0 ddots ddots vdots ) (vdots ddots ddots b ) (0 hdots 0 a(n,n)))and is also represented internally as a 4 x 4 array. However, the simplifications can use the information provided by the ellipsis during the reasoning process.
When a matrix containing ellipses is expanded into a lambda term the expansion algorithm translates the ellipses into appropriate conditions for the ifthenelse statements. Thereby the matrix is first scanned and broken down into its components, i.e. points, lines, and triangles. These can then be associated with corresponding index sets and translated into a lambda expression. For instance the diagonal ellipsis in our example matrix A above can be simply translated into the conditional IF i=j THEN a_{ij}, while the areas above and below the main diagonal where a horizontal, a vertical, and a diagonal ellipsis bound the area in which all the elements are either 0 or b. In an additional optimisation step neighbouring triangles are compared and can be combined to form rectangular areas.
The simplification for operations on matrices are extended to the cases where matrices contain ellipses. For example, the sum of matrices where both matrices contain ellipses at the same positions results in a matrix containing the sum of concrete elements and the ellipses between those elements. The multiplication of a diagonal matrix containing the same element on the diagonal is reduced to scalar multiplication with this element.
Formal representations of mathematical objects often do not model all important aspects of that object. Especially some of the structural properties may be lost or hard to recognise and reconstruct. In our work we investigated these structural properties for the case of matrices where there exist different representations for different purposes. Each representation has certain reasoning techniques attributed to it.
We modelled the structural knowledge about concepts with the help of annotations, which are used to identify objects and to store information about them. We implemented the different representations for matrices as annotated constants and showed how basic simplifications are performed. The representations for block matrices and ellipses allow us to represent matrices of a general form. Annotations are also used for manipulations of objects. Instead of deduction on formulas, many manipulations can be reduced to computations on annotations. Since we are able to express general matrices, we can express general properties and theorems based on our formalism. With simplifications performed on generalised matrices we are now able to express complex reasoning in the form of computational steps. In future work we want to investigate how this can further aid in the construction of actual proofs. Remember that we currently deal annotated constants, that is, only with ground terms. Thus, it would be useful to extend the work in a way that allows also to deal with variables.
Annotations preserve the correctness by their implementation as constants of the formal language. The proof construction is split into a phase where steps are performed based on the richer knowledge contained in annotations and a verification phase where these steps are expanded to calculus level proofs. The expansion is currently only implemented for a subset of the operations. The expansion mechanism for proof steps using annotations needs to be simplified and generalised. The use of canonical forms should help keeping this expansion simple.
Our work compares to de Bruijn's idea of a mathematical vernacular [Bruijn94], which should allow to write everything mathematicians do in informal reasoning, in a computer assisted system as well. In this tradition, Elbers looked in [Elbers98] at aspects of connecting informal and formal reasoning, in particular the integration of computations into formal proofs. Kamareddine and Nederpelt [KamareddineNederpelt2004] have formalised de Bruijn's idea further. While the approach to a mathematical vernacular is general, to our knowledge no attempt has been made to incorporate concrete objects like matrices directly. In the Theorema system Kutsia [kutsia02unification] has worked with sequence variables which stand for symbols of flexible arity. Sequence variables have some similarities to our ellipses. However, as opposed to sequence variables our ellipses allow only fixed interpretations. Moreover sequence variables can be viewed as an extension of the logical system which allows to deal with these expressions within the logic. The main emphasis of our work is to allow for representation within logic and extralogical manipulation of expressions at the same time. Bundy and Richardson [BundyRichardson99] introduced a general treatment for reasoning about lists with ellipses in a way that they consider an ellipsis as a schema which stands for infinitely many expressions and a proof about ellipses stands for infinitely many proofs, which can be generated from a metaargument.