TEACH GRAM1 Aaron Sloman October 2011 This is a truncated version of TEACH GRAMMAR, for more experienced, or more busy, or more impatient, learners. There will be some sequels: TEACH GRAM2 TEACH GRAM3 ... First look at the tiny introduction to editing commands if you have not previously done so: teach minived GENERATING AND ANALYSING SENTENCES (PART 1) ------------------------------------------ CONTENTS - (Use g to access required sections) -- You need to know a little about grammar -- A sentence is an 'np' followed by a 'vp' -- There are other syntactic categories -- Types of sentences can be described with grammars -- Words have categories too -- Now print out the grammar and the lexicon -- How to generate sentences -- Exercise 1: add more words to the lexicon -- Exercise 2: add adjectives to the grammar -- Exercise 3: railway station announcements ============================================================================= The LIB GRAMMAR library package makes available some programs which you can use to explore sentence structures. This teach file introduces you to the notion of a formal grammar for a simplified subset of English and shows you how to use the library program to generate or analyse sentences according to a grammar. To use this file you need to know how to read and manipulate files in the Poplog editor Ved. REMINDERS: Use DownArrow to scroll down -- Use UpArrow to scroll up Use Left and Right Arrow keys to move the Ved text cursor in a line Use PageDown and PageUp to move a screenful at time, up or down. To compile (load) a marked range do ENTER lmr RETURN To compile only one line, the line containing the text cursor do d (Press ESC key, release it, tap 'd') For revision on editing commands do ENTER teach minived RETURN Now read on and play. ============================================================================= -- You need to know a little about grammar ----------------------------- The following sections assume you know a bit about grammar. 1. Nouns are words like "cat" "dog" "car" "number". 2. Nounphrases are expressions like "the old man", "the dog that bit the cat", "each little lady who likes programming". We use NP as an abbreviation for "noun phrase". Each noun phrase typically refers to something. 3. Verbs are words like "hit", "look", "choose", "put", "turn". Some verbs are transitive, and are followed by a noun or noun phrase, whereas others are intransitive. In "Bill hit Fred", the verb "hit" is transitive and as "Fred" as its direct object, whereas in "Bill smiled" the verb "smiled" is intransitive: it has no object. 4. Verb phrases (abbreviated as VP), include things like "hit me", "looked at the old man", "chose the book in the car", "put the dog in the car in the box". A verb phrase typically says something about an object referred to in a noun phrase. So it normally needs to be preceded by a noun phrase to from a complete assertive sentence, as in [[the big boy] [hit me]] [[Mary] [looked at the old man]] [[the dog in the box in the corner][barked at the big grey cat]] 5. Prepositions are words like "at", "in", "inside", "over", "under", which are typically followed by a noun phrase to form a prepositional phrase, such as "at the house", "in the box", "inside the garden". -- A sentence is an 'np' followed by a 'vp' ---------------------------- We introduce a simplified notion of declarative sentence. By joining an NP to a VP (which may contain embedded NPs) we can form a sentence (abbreviated S), for example: [NP the cat] [VP bit the girl] [NP the man in the car] [VP looked at the little dog] [NP he] [VP jumped] [NP Mary] [VP smiled at John] [NP Every student] [VP enjoys programming] Try creating some more example sentences of your own, and dividing them into noun phrases and verb phrases. -- There are other syntactic categories -------------------------------- In addition to nouns and verbs those examples used other types of words, including determiners (abbreviated DET), like: "the", "each", "some" adjectives (ADJ) like: "old" "little" prepositions (PREP) like: "in" "at" "on" "under" We've also used different sorts of verbs - transitive verbs which require a following NP (e.g. "bit") and intransitive verbs which don't (e.g. "jumped"). Some verbs may allow or even require the use of an associated preposition, e.g. jump over NP put NP in NP look at NP Note that even if there were only one rule for forming sentences, namely to combine an NP with a VP, there could still be many different types of sentences because NPs can have different forms and VPs can have different forms. In fact there are many different ways of forming sentences. E.g. if S is a sentence so is "It is not the case that S" If S1 and S2 are sentences so are: S1 and S2 S1 or S2 if S1 then S2 and so on. These are compound or molecular sentences, which contain sentences as components. Atomic sentences do not contain sentences as components. -- Types of sentences can be described with grammars ------------------- The different forms of sentences can be described economically by means of a grammar and a lexicon. The grammar specifies the types of sentences, noun phrases, verb phrases, propositional phrases, and so on that are allowed by the language, and the lexicon indicates which words can be used in different roles in the sentence. If you give the computer a set of rules defining a grammar and a lexicon, it can show you what sorts of sentences it generates. First you have to define a grammar. Type in the following definition of a grammar, preferably using the editor to store it in a file called 'mygram.p'. (You can leave out the "comments" following three semicolons ";;;" - the three semi-colons tell POP11 to ignore the rest of that line. They are included here simply to help you see what's going on). We declare a variable MYGRAM which will be given a list of rules as its value. After the declaration, type in the list of rules (in this case a list of lists of lists!). NB don't forget '.p' in the file name. ;;; declare a variable mygram, and initialise it with a list of ;;; rules for sentence components vars mygram = [ ;;; start a list of rules ;;; a sentence is a NP then a VP [s [np vp]] [np [snp] [snp pp]] ;;; a NP is either a simple NP ;;; or a simple NP followed by ;;; a prepositional phrase PP [snp [det noun]] ;;; a simple NP is a determiner followed by ;;; a noun [pp [prep snp]] ;;; a PP is a preposition ;;; followed by a simple NP. [vp [verb np]] ;;; verbphrase = verb followed by NP ] ; Using the function keys F1 and F2 (and the editor move keys) mark the lines from 'vars mygram' to '];'. The 'marked range' is indicated on the left. If F1 and F2 do not start and end a range you can use ESC m ESC M for mark start and mark end. After marking, do CTRL d that will compile the range. Then compile just this line (mark it using F1 and F2, then CTRL d to compile): mygram ==> It will print out your grammar as a list of lists, omitting the comments (the text starting ';;;'). You now have a grammar, and need a lexicon. -- Words have categories too ------------------------------------------- Now you can type in a lexicon, to tell the system about nouns, verbs, prepositions, and determiners. ;;; Declare a variable mylex, and initialise it with a list of rules ;;; specifying lexical categories, i.e. types of words vars mylex = [ ;;; start a list of lexical categories [noun man girl number computer cup battle room car garage] [verb hated stroked kissed teased married taught added] [prep in into on above under beside] [det the a every each one some] ]; Using F1 and F2 (or ESC m and ESC M) mark the code from "vars" to ";" Then compile it (CTRL d). Then compile this line to show the value of the variable mylex mylex ==> If you feel brave add some more nouns and verbs, by editing the text above, and do it all again. -- Now print out the grammar and the lexicon -------------------------- Compile those two definitions. You can now print out your grammar, using the Pop11 'pretty-print' arrow '==>' thus: mygram ==> and print out your lexicon mylex ==> The "pretty print" arrow "==>" makes long lists print out neatly. -- How to generate sentences ------------------------------------------- Now make available the Pop11 grammar library: uses grammar; Compile that line (e.g. ESC d, or mark it first then CTRL d) That library provides (among other things) a Pop11 procedure called "generate". Here's how you get the computer to generate (randomly) sentences according to your grammar and lexicon (you will find that not all the sentences generated actually look like good English. That is a sign that the grammar is inadequate). Try this command (mark and compile, or use ESC d): generate(mygram, mylex) ==> Do it a few times, to see what happens. You can make the computer generate many sentences, by using the Pop-11 "repeat" construction. (You can change the number then do it): repeat 20 times generate(mygram, mylex) ==> endrepeat; -- Exercise 1: add more words to the lexicon -------------------------- You can now try extending the grammar and lexicon to generate a wider range of sentences. Edit the the grammar and the lexicon above. Exercise 1. Add more words in each category to the lexicon, and redo the above "repeat....endrepeat" command to generate sentences. Some of the sentences will make no sense, and many will appear quite ungrammatical. Try to analyse what exactly is wrong with them. -- Exercise 2: add adjectives to the grammar -------------------------- Exercise 2. Try adding adjectives (e.g. "big", "pretty", "clever", "square", "red") to the grammar. o First you will need to extend the lexicon (i.e. the list called "mylex") with a new category. Call the new category "adj". So you will have to add a new sublist of the form [adj ..... ] containing some adjectives. o Second you will need to extend the definition of the grammar so as to give a role for adjectives. The most plausible way to do this is to add a new sub-type of simple noun phrases (SNP), to cover noun phrases like "the red ball" "every big car" "some angry girl". The way to do this is to add a new sub-list to the "snp" list. Try doing that, and then generate 20 more sentences looking to see which ones now include adjectives based on your extension. When you've had enough of generating sentences continue reading this file. -- Exercise 3: railway station announcements -------------------------- The sentences generated by your grammar and lexicon include a lot of junk. Try starting again, and define a grammar and a lexicon that will generate only sensible sentences suited to a particular context. E.g. you could try one of these contexts: Railway train departure announcements Statements about tomorrow's weather A doctor describing a patient's health Choose one of those and produce a new grammar and a lexicon and test them with the "generate" procedure. You can introduce new types of grammatical or lexical categories to suit the context. The only absolute requirement is that the "top level" grammatical category should be "s", as the procedure generate starts working down from the rules for "s". HINT: in order to prevent nonsensical combinations of categories you may find it useful to break the categories down into matched sub-categories. For example you could distinguish animate nouns ("mary", "man"), and inanimate nouns ("train", "platform"), and distinguish different sorts of noun phrases depending on whether the nouns are animate or inanimate. Then you could distinguish verbs that need an animate subject (ani_verbs) from verbs that don't (inani_verbs). Then your rules could specify that only animate nouns can go with animate verbs and vice versa. You could make further subdivisions of the verbs according to what sorts of prepositions are allowed to follow them. If you do all this the labels for the types of words and phrases in the grammar could get quite long and complicated. Instead of just "verb" you might have something like "ani_verb_np_prep_np" to label a verb that requires an animate subject and is followed by a direct object (the first "np") then a preposition and an indirect object (the second "np"). Such a word could be "put" in a sentence like: [mary put the fish into the oven] mary - animate subject proper noun put - verb of type ani_verb_np_prep_np the fish - direct object np into - preposition the oven - indirect object np Steps towards an answer to a railway announcement program. Try the following grammar and lexicon: (Mark and load these two specifications, using F1 and F2, then CTRL d) vars traingram = [ [s [subject pred]] [subject [np] [np participle prepp]] [np [det train_noun] [det adj train_noun]] [prepp [prep np]] [pred [is adjphrase] [will verbphrase] [will verbphrase prepp]] [adjphrase [adv adj] [adj adv] [adj]] [verbphrase [verb_intransitive] [verb_intransitive adv] [verb_trans np ]] ]; vars trainlex = [ [participle arriving departing standing loading stopping waiting] [det every each the a some] [train_noun train carriage express sleeper service shuttle] [adj late early fast slow expensive reckless comfortable delayed overloaded expected regular special next previous] [prep at from near alongside above] [adv slowly quickly soon punctually tardily] [verb_trans overtake follow delay obstruct carry] [verb_intransitive wait explode depart leave pause arrive start crash finish] ]; Try this out: generate(traingram, trainlex) => This is still not a very sensible grammar. But it generates slightly better sentences than the original. See if you can improve it to generate good railway announcements. Try producing a grammar and a lexicon for some other familiar set of utterances, e.g. things you may say to describe some people at a dinner table. When you've had enough go to TEACH GRAM2, to learn how to use the grammar to create a parser that can analyse the structures of sentences that conform to the grammar. TEACH GRAM2 (may not be available when you first try) --- $usepop/pop/teach/gram1 --- University of Birmingham 2011. All rights reserved. ------