TEACH INTERP Chris Mellish and Aaron Sloman === SEMANTIC INTERPRETATION ========================================== The purpose of this demo is to begin to investigate how we might write a computer program to extract the MEANING from a piece of text. By now you should have some idea (from the PARSING demo, LIB GRAMMAR and TEACH GRAMMAR) how we can determine the syntactic structure of a sentence, given a grammar for the language under consideration. In this demo, assume that syntactic structures can be discovered somehow, and concentrate on how the syntactic structure of a sentence can be used as the input for a process of SEMANTIC interpretation. The basic ideas that we will make use of have been used in the formal analysis of both natural and artificial languages, as well as in computer programs to process language. Extracting the Meaning ---------------------- How are we to extract the meaning from something as complex as a book, a paragraph, or even a sentence? Many people have pondered this question, but one principle due to Frege stands out in just about every approach that has been made. It can be informally stated as: The meaning of the whole is a function of the meanings of the parts Frege distinguished at least three different uses of the word 'meaning' called REFERENCE (roughly, what thing or class of things in the world is referred to) SENSE (roughly, HOW it is referred to) and the ASSOCIATIONS suggested by a word or phrase. The third sort of meaning will vary enormously from one individual to another, whereas Frege thought the first two were common to different users of the same language. Frege's principle, stated above applies to the first two senses of 'meaning'. The notion of 'sense' is hard to define precisely, and is very controversial. For present purposes we shall ignore it, and refer only to reference. So we are concerned with the problem of finding the reference of a noun phrase, or the class of objects referred to by a predicate, or the truth-value of a sentence. (Frege showed that the truth-value could be thought of as a special kind of reference.) According to this principle, the reference of a sentence can be expressed in terms of the references of the phrases within it. The references of these in turn depend on the references of the subphrases within them. And so on, until we are down to the references of individual words (or even smaller). Adopting Frege's principle does not make extracting the reference a trivial process. It merely gives us a framework for organising our ideas and decomposing the problem. In particular, we must resolve the following questions: 1. What are the appropriate subphrases to consider when we want to obtain the reference of a phrase? 2. Just how does the reference of a particular phrase depend on that of its subphrases? 3. What are the references of individual words (the phrases that no longer subdivide into smaller phrases)? 4. What kinds of things should meanings be anyway? Are they symbols in the machine? Are they things in the world? Are they some kind of relationship between the two? As for what the appropriate subphrases of a phrase are, we might hope that these would be precisely what are given by the syntactic structure. So, for instance, when we have a rule in the grammar that says: [s [np vp]] then as well as saying something about what forms a sentence can take ("a sentence can be a noun phrase followed by a verb phrase"), it might say something about how the reference is to be extracted ("the reference of the sentence is some function of the references of the noun phrase and the verb phrase"). (You should here be reminded of some of the work you did on 'propositions' in the MENTAL SCHEMATA course.) Of course, it may be asking too much for the same breakdown of a phrase into subphrases to yield both syntactic and semantic generalistions. However, if the reason for using syntactic analysis is only to provide an appropriate preliminary for semantic analysis, we are obviously free to choose our grammar in a way that divides up phrases in a way that is semantically appropriate. We might nevertheless hope that there is some consistent relationship between syntax and semantics in natural languages, and that a semantically appropriate grammar also captures syntactic generalisations. Apparent exceptions might be constructions like Jack, but not his wife, likes fat If alive call the doctor otherwise the undertaker Are these exceptions? Can you find other cases where the syntactic divisions don't seem to correspond to semantic components, in English? Assuming that syntactic analysis divides up phrases correctly into subphrases, it remains for us to specify just how the reference of a complex expression (e.g. the truth-value of a sentence) depends on the references of the parts (e.g. the noun phrase and the verb phrase). It would be desirable to be able to cope with every possible way that complex phrases can be made up of immediate subphrases. This leads to an approach where syntactic and semantic rules are PAIRED - corresponding to each syntactic rule saying how a phrase can be constituted, there is a semantic rule which says how the reference of such a phrase is composed out of the references of the parts. We will shortly be seeing some examples of this. There are two main options for how we can use these semantic interpretation rules. A: As soon as the parser recognises the presence of a phrase consisting of some sequence of subphrases, the reference of the whole phrase is immediately worked out from the references of the parts (at the same time as this bit of parse tree is constructed). In such a regime, the reference of a single word would be constructed as soon as the parser encountered it. Such a possibility would involve adapting the parser, and we will not investigate it here (although we will see that there would be advantages in it). B: The reference is worked out after the parse tree for the whole sentence has been constructed. In this case, the semantic interpretation process must look at the parse tree to determine how the sentence decomposes. It will start by looking at the top ("sentence" node). To find the reference of this, it will need to look at the immediate constituents of this. To find the references of these, it will need to look further down the tree, and so on. This kind of semantic interpretation can be characterised as a kind of recursive tree-walking exercise, where values (references) are associated with the nodes of the tree. You may wish to think about whether either of these two approaches would be appropriate for analysing the SENSE of a complex expression, rather than its REFERENCE. To see how these ideas work in practice, we start with a simple language for arithmetic expressions. Semantic Interpretation for an Expression Language -------------------------------------------------- Consider a restricted artificial language, in which one can write arithmetic expressions, using numbers and the symbols +, *, ( and ). Because of the way LIB GRAMMAR works, we will write numbers as words (and restrict ourselves to small numbers). We will also require that brackets are provided around ALL non-trivial subexpressions of an expression. Given this, the following is an appropriate description of the language. In this language, "expr" is supposed to be the "starting symbol" - the 'top-level' category that utterances in the language are supposed to belong to (compare "sentence" in English). [ [expr [simple] [simple + simple] [simple * simple]] [simple [number] [( expr )]] ] -> grammar; [ [number one two three four five six] ] -> lexicon; Type this in to LIB GRAMMAR and try out a few examples: : REPEAT 9 TIMES EXPR(READLINE())==> ENDREPEAT; Try and give the parser a wide variety of kinds of phrases, so that you can establish just what is and what is not a legal "expr" in the language: What interpretation might we want to place on a phrase in this language? It is reasonable to say that all we can talk about in the language is numbers. We can talk about them more or less directly. For instance, "four" is a way of talking about 4. The phrase "two * (one + one)" is another way of talking about the same number. The have the same REFERENCE though not (according to Frege) the same SENSE. So we will take the 'interpretation' of a phrase to be the number it denotes, or refers to. (An alternative might be to think of the interpretation as the process you have to go to work out the number denoted by the expression. This would be close to Frege's notion of 'sense'. In that case, "four" and "two * (one + one)" would not have the same interpretation.) How can we organise the derivation of the reference? Fortunately, the parse trees produced, given the grammar above, provide just the information that is needed for determining the reference of phrases in the language. For instance, the phrase "two + (three * four)" gives rise to a parse tree with three immediate subtrees - those corresponding to "two", "+" and "(three * four)". The presence of the "+" signals to us that the reference of the whole phrase is to be determined by adding together the references of the two subphrases. In saying this we are saying something about the meaning of '+'. In fact, we are assuming a procedural interpretation of its meaning, something like its SENSE. You could think of it as having a reference, which is the (infinite) set of all possible triples consisting of two numbers and their sum. Then, finding the reference of "two + three" might involve searching the set for a triple starting with the references of "two" and "three", and taking the third element as the result. Instead, we just assume the ability to add! How can we design a program to 'evaluate' an arithmetical expression? We want a procedure INTERP to look at the top level of the parse tree. This should be a tree for a grammatically correct "expr". It will, as usual, start with a symbol saying what kind of thing it is the parse tree for. We can then dig out the sub-tree and use another procedure to interpret it, called INTERP_EXPR: define interp(phrase); vars expr; phrase --> [expr ??expr]; interp_expr(expr) enddefine; Notice that we will keep to certain conventions throughout this program. There will be one procedure that "knows about" each syntactic category and how references of instances of that category can be obtained. Since all "references" here are numbers, these procedures will generally receive bits of parse trees as their arguments and produce numbers as their results. In the above definition of INTERP, a numerical result is intended - it is simply the result of the call of INTERP_EXPR. Now for the definition of INTERP_EXPR. Looking back at the grammar, we can see that there are three possible cases for an "expr". The definition will have to cope with each of these: define interp_expr(phrase); vars e1 e2; if phrase matches [[simple ??e1]] then interp_simple(e1) elseif phrase matches [[simple ??e1] + [simple ??e2]] then interp_simple(e1) + interp_simple(e2) elseif phrase matches [[simple ??e1] * [simple ??e2]] then interp_simple(e1) * interp_simple(e2) endif; enddefine; Notice how we are making use of a procedural interpretation of the arithmetic operators. That is, for now, we are not defining procedures called INTERP_PLUS, INTERP_TIMES, but simply using the standard procedures for adding or multiplying. This definition makes use of another procedure (yet to be defined) INTERP_SIMPLE, which can extract the reference from phrases of the category "simple". Notice that (because of how INTERP works), the argument of INTERP_EXPR is an "expr" tree with the initial "expr" removed. For instance, if the initial argument given to INTERP was: [expr [simple [number two]] + [simple [number three]] ] then the tree fragment given to INTERP_EXPR would be [ [simple [number two]] + [simple [number three]] ] Similarly, the argument to INTERP_SIMPLE is a "simple" tree with its head cut off. define interp_simple(phrase); vars e n; if phrase matches [[number ?n]] then interp_number(n) elseif phrase matches [( [expr ??e] ) ] then interp_expr(e) endif enddefine; Notice how INTERP_SIMPLE calls INTERP_EXPR, and also INTERP_EXPR calls INTERP_SIMPLE. This phenomenon, where two things are defined each in terms of the other, is called MUTUAL RECURSION. Because of the mutual recursion, INTERP_SIMPLE (and INTERP_EXPR) may end up indirectly calling itself. However, the program is guaranteed to terminate because each time a procedure is called from within another, the "phrase" argument of the new procedure is smaller than the original one. Since we cannot go on indefinitely finding smaller and smaller lists, the succession of one procedure calling another calling another.... must eventually stop. In fact, the sequence of procedure calls will finally stop when it gets to a word specifying a number. We need to provide one more procedure, to cover this case: define interp_number(word); if word = "one" then 1 elseif word = "two" then 2 elseif word = "three" then 3 elseif word = "four" then 4 elseif......... ........... endif enddefine; Exercise 1: ----------- Type in definitions of the above procedure and try them out on some examples, eg. : TRACE EXPR; : REPEAT 6 TIMES INTERP(EXPR(READLINE())) => ENDREPEAT; ** .... You could trace INTERP and the procedures it uses, also. Exercise 2. ----------- Extend the grammar and semantic interpretation procedures so that subtraction and division can also be handled in the language. How would the grammar have to be extended if we wanted unambiguous expressions to be handled even if brackets were not provided everywhere (eg. phrases like "two + one * five")? Exercise 3. ----------- Write a short explanation of how the procedures work, using trace printing to illustrate. Talking about Numbers in English -------------------------------- The previous section dealt with an artificial language for talking about numbers. We will now look at how numbers are talked about in English. Of course, one way we can talk about numbers is just to write down the figures. That is not very interesting. More interesting is how we put together complex phrases like: twelve thousand three hundred and ninety nine The way we specify numbers by sequences of words is not random, but obeys certain implicit rules. These rules determine both the syntax of the phrases (which sequences of words are legal) and the semantics of the phrases (what they mean). Let us look at the syntax first. The words representing the numbers from 1 to 9 seem to belong to a special category in the grammar of English numbers. These numbers can appear in any of the contexts represented as '...': twenty ... a hundred and ... Let us call this category the category "digit". The words for the numbers from 10 to 19 also seem to fall naturally within a category. These words can appear in all but the second of the above contexts. Let us call this the category "teen". The phrases for the numbers from 21 to 29, 31 to 39, 41 to 49, ... 91 to 99 have a stereotyped form - one word followed by a "digit". We could invent a category for the words that start these - "twenty", "thirty", etc. - calling it "tens" say. And we can continue in this way, extending the grammar to cover as many numbers as we like. To some extent, the grammar we write is arbitrary, as long as it covers all the valid phrases and nothing else. Where there are options, we might be guided by which yields the most concise grammar. Here is a fragment of the grammar that might be produced: [ [number [undertwenty] [twentytohundred] .... [undertwenty [digit] [ten] [teen]] [twentytohundred [tens] [tens digit]] ... ] -> grammar; [ [digit one two three four five six seven eight nine] [tens ten twenty thirty forty .... ... ] -> lexicon; Exercise 4. ---------- Can you extend that grammar, by adding new forms of "number"? Now for the semantics. This is fairly straightforward, because every phrase and subphrase can be seen as talking about a number, just as in the previous language. We can start writing procedures as follows: define interp(phrase); vars num; phrase --> [number ??num]; interp_number(num) enddefine; NB you have to think carefully about the structure that ??num will pick up. It will have one more set of brackets than you might think. Consequently, the next procedure needs what you might think of as surplus brackets. Look carefully at the parse-trees produced by lib grammar, and be sure you understand the brackets! define interp_number(phrase); vars n; if phrase matches [[undertwenty ??n]] then interp_undertwenty(n) elseif phrase matches [[twentytohundred ??n]] then interp_tth(n) elseif ... ... enddefine; define interp_digit(word); if word = "one" then 1 elseif word = "two" then 2 elseif word = "three" then 3 ... enddefine; Exercise 5: ----------- Complete the number grammar, so that it covers all numbers from 1 to 999999 (this is not as bad as it sounds). Or, if you like, make a number grammar for another language, like French or German. Also, provide a complete set of interpretation procedures, so that a reference can be worked out for every phrase allowed by the grammar. GENERATING and TRANSLATING Another thing you might try is generating the English phrase for a given number. This time, you can again have a procedure associated with each grammatical category. The program might start something like: define generate_number(n)->list; if n < 20 then generate_undertwenty(n)->list elseif n < 100 then generate_twentytohundred(n)->list ...... enddefine; define generate_twentytohundred(n)->list; if ...10...n... then generate_tens(n) else generate_tens(...n...10...) <> generate_digit(...n...10...) endif enddefine; define generate_digit(n); if n = 1 then [one] elseif n = 2 then [two] ... enddefine; If your generating and interpreting procedures are working with different languages, you can use the combination to translate from one language to another. For instance, if INTERP interprets numbers expressed in French and GENERATE_NUMBER generates a phrase in English from a number, then the following procedure will translate from French into English: define translate(phrase)->newphrase; generate_number(interp(phrase))->newphrase enddefine; Exercise 6: ----------- Generalise one of the grammars so that instead of merely allowing phrases which denote numbers, they permit complete sentences, like four + four = (two * three) + two or four times seven equals twenty six Define procedures which will evaluate such sentences, using the parse trees as input. Exercise 7: ----------- Write a program which takes an expression in one of the above languages and instead of creating a parse tree computes the reference of the whole expression, i.e. using strategy A referred to above. You may find it useful to make use of the general form of the procedures described in the PARSING demo, but with suitable modifications. PROBLEM: What about translating from English into the "expression" language defined by our first grammar above? PROBLEM: What is the relationship between interpreting a sentence and interpreting an image? In both cases we can say that one sort of structure is related to some other object. Is there anything more to be said about the comparison? Is the notion of syntax relevant to pictures? Can pictures have a sense and a reference? PROBLEM: In general, building a representation of the reference of an expression will be inadequate to our ordinary notion of building a representation of the meaning, since what we ordinarily understand by 'meaning' is closer to sense than to reference. For instance, if you had to translate the phrase 'The Vice-Chancellor of Sussex University' into French, it would not do to find just any old French expression which happened to refer to Denys Wilkinson. E.g. using his name in the French version would not count as a translation, and it would be positively misleading in sentences like: 'The Vice-Chancellor of Sussex University chairs Senate meetings' for this refers to the role, not the person. The same problem arises concerning the translation of phrases in 'intensional' contexts, e.g. 'Fred believes that the Vice Chancellor is a physicist' The V-C may actually be the man who often sits next to Fred on the bus, and chats to him, about the weather and politics. It doesn't follow that the above sentence has the same meaning as: 'Fred believes that the man on the bus is a physicist' even if the two bits which differ have the same reference. In fact, one could assert a true proposition whilst the other is false. Try thinking of a way of representing the meaning which would deal with these problems. CM and AS -9- Jan 1982