TEACH PARSE Steven Hardy, November 1977 Revised A.Sloman Nov. 1989 PARSING NATURAL LANGUAGE ======================== CONTENTS - (Use g to access required sections) -- Introduction -- Design for a parser -- Defining np -- Defining qnoun -- Simplifying the programs by defining "parse" -- Going beyond a "bracketer" for sentence sub-structures -- Making the grammar more restrictive -- Further Reading -- Introduction ------------------------------------------------------- This teach file sketches out a simple 'top down' parser for a subset of sentences of a very simplified version of English. It is based in part on the ideas in TEACH GRAMMAR, and is intended to help the reader understand Terry Winograd's SHRDLU system. The procedure -generate- illustrated in TEACH GRAMMAR generates random sentences. Wherever the grammar offers a choice to the sentence generator, -generate- picks one at random. Obviously, we don't do this in normal conversation; the choices we make are constrained by what we want the final sentence to mean. A parser is a program that takes in a sentence and produces a description of the sentence in terms of the grammatical sub-structures used in the sentence. One justification for designing a parser is the assumption that when we hear a sentence we should try to discover the sequence of syntactic choices it represents as these ought to help us discover the 'meaning' of the sentence. One way to study this 'parsing' would be to write a program that works like -generate- in reverse. I.e. the new program should use a finished sentence as a guide, and analyse its structure, rather than make random choices. For example: [the big man loved a happy girl] -> input; sentence() => ** [[the [big man]] [loved [a [happy girl]]]] In some cases it may be possible to analyse the sentence in two different ways, with different meanings corresponding to them. E.g. should [they are eating apples] be analysed as [[they] [are] [eating apples]] (i.e. the apples are for eating) or as [[they] [are eating] apples]] (i.e. the people are eating apples) It may not be possible to decide between two such analyses on the basis of grammatical rules alone. However, this kind of ambiguity will be ignored in what follows. -- Design for a parser ------------------------------------------------ The parser described below has the following properties: 1) Each syntactic class, like NP and QNOUN, has a separate parsing procedure associated with it. 2) These procedures 'read' an instance of the appropriate class from the INPUT and return a suitable bracketed representation of that instance 3) If they are unable to do this (perhaps because the INPUT is syntactically incorrect) the parsing procedures are to leave the INPUT untouched and return FALSE. We will find it useful to have a procedure to READ a word from the INPUT: define read() -> result; if input = [] then false -> result else hd(input) -> result; tl(input)-> input endif enddefine; Notice that this uses -input- as a NON-LOCAL variable. In general this is a dangerous thing to do, as it can lead to unexpected interactions between programs. There are techniques for reducing this sort of interaction by using "sections" (HELP * SECTIONS) or "lexical" scoping of variables (HELP * LEXICAL). However, explaining these techniques would be a distraction here, so the problem is ignored in this file. We can now use the procedure -read- to parse, say, an adjective: define adj() -> result; vars saved, temp; ;;; Save the input in case it has to be restored input -> saved; false -> result; read() -> temp; if member(temp, [big happy ...]) then temp -> result endif; unless result do ;;; If result is false, -adj- has failed, so restore input saved -> input endunless enddefine; This must look horrific - need the procedure really be so complicated? The straightforward answer is 'no' because I've made the procedure fit a general pattern thus: define name() -> result; enddefine; -- Defining np -------------------------------------------------------- Using these conventions we could also write a program to parse a noun phrase: define np() -> result; vars saved, t1,t2; input -> saved; article() -> t1; qnoun() -> t2; if t1 and t2 then [^t1 ^t2] -> result endif; unless result do saved -> input endunless; enddefine; Notice that this procedure wastefully tries to parse a QNOUN even if it failed to parse an ARTICLE. -- Defining qnoun ----------------------------------------------------- The QNOUN procedure is a little tricky to write since there are two alternatives for a qualified noun: define qnoun() -> result; vars saved, t1,t2; input -> saved; noun() -> t1; if t1 then t1 -> result else saved -> input; adj() -> t1; if t1 then qnoun() -> t2; if t2 then [^t1 ^t2] -> result endif endif endif; unless result then saved -> input endunless enddefine; This procedure tries its first alternative (a noun); if that succeeds all is well otherwise QNOUN resets the input (unnecessarily in this case) and tries its second alternative. -- Simplifying the programs by defining "parse" ----------------------- I don't really like the above programs - they are far too long and complicated. Perhaps we could simplify them a bit by putting some of the common code to a separate procedure, thus: define parse(type) -> result; ;;; The input "type" is a procedure that should parse a certain ;;; type of word, phrase, clause, etc vars saved; input -> saved; false -> result; type() -> result; unless result then saved -> input endunless enddefine; We can use this procedure to rewrite our qualified noun recognizer: define qnoun() -> result; vars t1 t2; parse(noun)-> result; if result then return endif; saved -> input; parse(adj) -> t1; unless t1 then return endunless; parse(qnoun) -> t2; unless t2 then return endunless; [^t1 ^t2] -> result; enddefine; Here is the procedure to parse noun phrases: define np() -> result; vars t1, t2; parse(article) -> t1; unless t1 then return endunless; parse(qnoun) -> t2; unless t2 then return endunless; [^t1 ^t2] -> result enddefine; Notice that this version of NP is 'cleverer' than before - it 'gives up' straightaway if it doesn't find an ARTICLE. -- Going beyond a "bracketer" for sentence sub-structures ------------- By now you ought to have an idea of how to complete a 'bracketer' for simple English sentences. However, a good parser can do more than this one task. E.g. try modifying the definition of -adj- so that instead of simply returning the adjective found, parse(adj) returns a list containing the adjective preceded by the label, adj, e.g. parse(adj) => [adj red] Similarly parse(noun) should return structures like [noun man]. Then you can modify qnoun so that it returns structures like [qnoun [adj big] [qnoun [noun man]]] [qnoun [adj big] [qnoun [adj old] [qnoun [noun man]]]] E.g. explore the following possibility: define qnoun() -> result; vars t1 t2; parse(noun)-> result; if result then [qnoun ^result] -> result; return endif; saved -> input; parse(adj) -> t1; unless t1 then return endunless; parse(qnoun) -> t2; unless t2 then return endunless; [qnoun ^t1 ^t2] -> result; enddefine; What kind of structure will this produce for "the big old man" -- Making the grammar more restrictive -------------------------------- In TEACH GRAMMAR, it is suggested that it may be possible to modify the grammar to prevent silly sentences being accepted as grammatical. How might you prevent sentences like this being accepted? the tiny ant lifted the big red small lorry Could this be done by adding "features" to grammatical structures and requiring features to match up properly? Here are some suggestions to explore. We ought to give the parsing procedures a list of expected features and expect back a list of actual features. Some check on these not being inconsistent should be made, for example the word ANT could have features: [[type noun] [class animate] [strength low]] LORRY might be [[type noun] [class inanimate] [weight high]] The verb LIFT might have associated with it some advice to the effect that the STRENGTH of its subject ought to 'match' the WEIGHT of its object. If this is too hard we ought, at last, to expect the various adjectives in a noun phrase to not have contradictory features. For example knowing that BLUE has property [COLOUR BLUE] and RED has [COLOUR RED] ought to stop our parser accepting THE RED BLUE BALL. This discussion of parsing has completely omitted any mention of what one ought to do with ungrammatical input. This is obviously a problem since, in practise, much of what we hear is ungrammatical. Enabling a parser to do something sensible with it is a hard problem. -- Further Reading ---------------------------------------------------- TEACH * GRAMMAR TEACH * ISASENT TEACH * PARSESENT TEACH * TRANSSENT TEACH * PARSING HELP * FACETS Explains a library program for associating meanings with substructures in a sentence. HELP * TPARSE Describes a library program that can find all the different ways of parsing a sentence in accordance with a given grammar. --- C.all/teach/parse ------------------------------------------------ --- Copyright University of Sussex 1990. All rights reserved. ----------