Definite Clause Grammars

This part is about a way of writing syntactic recognisers and parsers directly in Prolog. This is used as an introduction to syntactic processing.

Top-down rules in Prolog

Prolog has an inbuilt search mechanism which is top-down. This search mechanism uses depth-first control.

We can write simple Prolog rules that describe a syntactic grammar.

	s(Start, Remainder) :- 
	     np(Start, Intermediate),
	     vp(Intermediate, Remainder).

	vp(Start, Remainder) :- 
	     verb_group(Start, Intermediate),
	     np(Intermediate, Remainder).

	vp(Start, Remainder) :- 
	     verb(Start, Intermediate),
	     np(Intermediate, Remainder).

	verb_group(Start, Remainder) :- 
	     aux(Start, Intermediate),
	     verb(Intermediate, Remainder).

	np(Start, Remainder) :-
	     pronoun(Start, Remainder).

	np(Start, Remainder) :-
	     noun(Start, Remainder).

	np(Start, Remainder) :-
	     adj(Start, Intermediate),
	     noun(Intermediate, Remainder).

	adj([flying|Rest], Rest).

	aux([are|Rest], Rest).

	noun([planes|Rest], Rest).

	pronoun([they|Rest], Rest).

	verb([are|Rest], Rest).
	verb([flying|Rest], Rest).
Each clause has a pair of arguments. It is important that these arguments appear in pairs, because they are used in pairs. The technical name for these arguments is difference lists. As their name suggests, the arguments are instantiated to lists, one list being the difference of the other list with one or more elements removed. For instance, if we look at the pronoun/2 fact, the arguments might be instantiated to:
	pronoun([they|are,flying,planes], [are,flying,planes]).
Another way of describing what is happening is to say that the second argument is the same as the first argument, except the first element has been "gobbled-up". Needless to say, "gobbled-up" is not a technical term in NLP, but we usually speak of input being "consumed".

This grammar can be run as a program by typing:

	| ?- s([they,are,flying,planes], []).
The first argument includes all the words to be parsed: the second argument represents how many words we want left when the processing is completed. We could trace the program to show how the arguments are instantiated while the program runs, but this would be a little too lengthy for our purposes. So, we will use the example of a smaller phrase, such as a np. We will use the phrase flying planes as an example.
	Clause no	Depth		Action	Goal
	1		1		call		np([flying,planes],[]) 
	2		2		call			pronoun([flying,planes],[]) 
	2		2		fail			pronoun([flying,planes],[]) 
	3		2		call			noun([flying,planes],[]) 
	3		2		fail			noun([flying,planes],[]) 
	4		2		call			adj([flying,planes],_1183) 
	4		2		exit			adj([flying,planes],[planes]) 
	5		2		call			noun([planes],[]) 
	5		2		exit			noun([planes],[]) 
	1		1		exit		np([flying,planes],[]) 
The interesting lines are those that are labelled exit. It is here that the difference lists can be seen consuming input.


Definite Clause Grammars

Writing rules using difference lists is tedious because you have to be certain that you get every single difference list correct: one wrong variable name, one mis-typing and the program will fail to work as intended. Fortunately most implementations of Prolog include an extension that allows us to write grammars such as these more succinctly. This extension is known as the Definite Clause Grammar formalism.

In essence, Prolog adds the extra arguments automatically. A Definite Clause Grammar (DCG) version of the first rule in the example above would be written as:

	s --> 
	     np, 
	     vp.
"Dictionary entries" are written in a more esoteric form. For instance, the fact pronoun([they|Rest], Rest). is written as:
	pronoun --> [they].
We are now in a position where we can rewrite the whole grammar.
	s --> 
	     np,
	     vp.

	vp --> 
	     verb_group,
	     np.

	vp --> 
	     verb,
	     np.

	verb_group --> 
	     aux,
	     verb.

	np --> 
	     pronoun.

	np --> 
	     noun.

	np --> 
	     adj,
	     noun.

	adj --> [flying].

	aux --> [are].

	noun --> [planes].

	pronoun --> [they].

	verb --> [are].
	verb --> [flying].

Parsing with DCGs

The program given above is a recogniser. Consider the following example:
	| ?- s([they,are,flying,planes],[]).
	yes
We merely get a yes/no response> This is the behaviour of a recogniser. For a parser, we need to be able to build a representation of the structure of sentences. In DCGs this is easy: we simply add an extra argument that is instantiated to the syntactic structure.

At the "bottom", that is the terminals and pre-terminals, we add a small tree to dictionary entries. We want to represent the following part of the tree:

We can represent the same information as a Prolog structure:

	pronoun(they)
We need to add an argument to each clause in represent the tree. This is how it is done for pre-terminals:
	adj(adj(flying)) --> [flying].

	aux(aux(are)) --> [are].

	noun(noun(planes)) --> [planes].

	pronoun(pronoun(they)) --> [they].

	verb(verb(are)) --> [are].
	verb(verb(flying)) --> [flying].
When we add the arguments to the non-terminal rules, we have to be more subtle. Consider the tree which we want to build, for instance for the noun phrase that dominates the pronoun they. In a Prolog form, the tree will look like:
	np(pronoun(they))
Now, we get the pronoun(they) part of the tree from the pre-terminal rule, so we can use a structure such as:
	np(Pronoun)
We then unify the Pronoun with the pronoun(they) that we obtain from the pre-terminal rule. We can now rewrite the rule to include an argument to represent the parse tree:
	np(np(Pronoun)) --> 
	     pronoun(Pronoun).
Assuming you have loaded the modified version of the pre-terminals given above into a file together with this new np rule, you can see the tree built by this rule by using the query:
	| ?- np(Tree, [they],[]).

We can add an extra argument to each of the non-terminal rules. Notice how the tree is represented in rules which have more than one subgoal, such as the s rule.

	s(s(NP)) --> 
	     np(NP),
	     vp(VP).

	vp(vp(VG, NP)) --> 
	     verb_group(VG),
	     np(NP).

	vp(vp(Verb, NP)) --> 
	     verb(Verb),
	     np(NP).

	verb_group(verb_group(Aux, Verb)) --> 
	     aux(Aux),
	     verb(Verb).

	np(np(Noun)) --> 
	     noun(Noun).

	np(np(Adj, Noun)) --> 
	     adj(Adj),
	     noun(Noun).
This program can be run using the query:
	| ?- s(Tree, [they,are,flying,planes],[]).
There should be two phrase structure trees for this query. Check you know how they differ, perhaps by drawing the trees as a picture.


© P.J.Hancox@bham.ac.uk