Using the grammar and lexicon presented in the section on the terminology of syntax and context-free grammars, we show how phrase-structure trees can be developed top-down or bottom-up. This is presented as search. Using the sentence they are flying planes, we show that the search can be controlled either depth-first or breadth-first.
Search strategies
The grammar presented in the section on the terminology of syntax and context-free grammars was:
s -> np vpvp -> verb_group np
vp -> verb npverb_group -> aux verb
np -> pronoun
np -> noun
np -> adj noun
The lexicon was:
adj -> flyingaux -> are
noun -> planes
pronoun -> they
verb -> are
verb -> flying
We have seen that we can derive phrase-structure trees from the grammar for the sentence they are flying planes. In this section we will describe how to search for solutions in the form of phrase-structure trees. There are two main search strategies: top-down and bottom-up.
We can then draw in the non-terminals that are dominated by S:
We then pick one of the dangling non-terminals and expand it:
We then repeat the process by expanding another non-terminal:
and another:
until we have completed the expansion of all non-terminals.
We can then try to add a non-terminal that dominates one or more "lower" nodes:
We then repeat the process:
and again:
and again:
until we can add the distinguished symbol to the root of the tree:
Control is the name we give to the process of handling alternatives in search. There are essentially two kinds of control: depth-first control and breadth-first control.
To give an example, consider the top-down search when it has got as far as:
There are two VP rules which can be used:
When the aux and verb have been consumed, the alternatives rules for np are placed on the stack:
Each alternative would be popped from the stack in turn, until there only remains:
It is only now that this alternative VP rule can be used.
There are two VP rules which can be used:
The first entry is removed from the queue and expanded, but the new entry goes at the end of the queue:
This means that the next entry to be expanded is the alternative VP rule, not the verb_group. As an exercise, work out the order in which top-down depth-first and top-down breadth-first search algorithms would find the two trees for they are flying planes, and then compare your solution with mine.
The top-down search strategy
The top-down search strategy is sometimes known as hypothesis-driven search, because it essentially operates by proposing that the input string (in this example, they are flying planes), is covered by the distinguished symbol of the grammar. In the example given above, we would start by proposing an S as the distinguished symbol:
The bottom-up search strategy
The bottom-up search strategy is sometimes known as data-driven search, because it essentially operates by building upwards from the input string (in this example, they are flying planes), to the distinguished symbol of the grammar. This example is rather an informal description:
Control of the search strategy
As we saw in another section, there is more than one phrase-structure tree that may be derived from the grammar when applied to they are flying planes. It is likely that we will want our NLP systems to find all trees of globally ambiguous sentences such as this. (We may want to choose between the trees on the basis of some other kind of information, such as semantic information.) When we're writing algorithms, we have to have a method of ensuring that we've considered every possibility, that is, we have to be sure that the output of the parser is complete.
Depth-first control of the search strategy
Control is the process of handling alternatives in search: depth-first control pursues one alternative as far as possible until it is successful or blocks. Only then does it consider the next alternative. This is the control used in usual implementations of Prolog and it is often associated with the term chronological backtracking.
vp -> verb_group np
In depth-first control, the alternatives are placed on a stack and the first item popped from the stack. This is expanded by placing alternatives on the stack. So our stack would look like:
vp -> verb np
verb_group -> aux verb [+ np]
vp -> verb np
np -> pronoun
np -> noun
np -> adj noun
vp -> verb np
vp -> verb np
Breadth-first control of the search strategy
Breadth-first control uses a queue rather than a stack data structure. If we look again at the handling of the alternative VP rules, we'll see the difference.
vp -> verb_group np
vp -> verb np
vp -> verb np
verb_group -> aux verb [+ np]