It is possible to use Finite State Networks to recognise many acceptable English sentences. However, it is possible to show that any Finite State Automata cannot model some language constructs, for instance centre-embedded phrases. Also Finite State Automata descriptions of the syntax of natural language are repetitious and long-winded.
Linguists evaluate their theories by criteria of adequacy. A theory can be adequate at different levels: for instance, it might describe all possible sentences that a language includes (and no other sentences) but not give any insight into how a brain produces these sentences. If you want to produce a description of the sentences of a language, this must be an adequate theory: if you intended to throw light onto human linguistic processes, then the theory is inadequate.
Linguists constrain their theories by requiring them to be "barely sufficient". At a grand level, it is alleged that a roomful of chimpanzees typing into word processors would eventually produce the complete works of Shakespeare, simply by the coincidence of their having hit the right keys in the right order, but without understanding what they are typing. (Cynics (who are perhaps realists) may claim that the fictitious Buddhist monks working on the large-scale Tower of Hanoi problem are more likely to finish their task before the chimps finish theirs.) An account of play-writing that stated that Shakespeare managed to complete his canon by the good luck of randomly writing words in some order that other people thought good is clearly a rotten theory: it explains nothing. What we want is a theory that is sufficiently constrained as to fit into what we know about the resources available. Shakespeare had only one lifetime of fifty-two years available to him. A theory to describe Shakespeare's ability would have to be constrained at least by time.
The idea of constraints or resources can be turned around. We know little about what the human brain does. We can make intelligent guesses at its capabilities by what it produces. Suppose we have three grammars that have differing mathematical power and different amounts of [computer] processing requirements:
These definitions may seem daunting, but we'll look at how we use them to evaluate Finite State Automata as models of English syntax and then their meaning will become clear.
Instead of labelling the arcs with pre-terminals, such as det, adj and noun, we'll use strings of letters to illustrate our examples. We could draw a FSTN that could recognise and generate the following strings:
This is a very unrestricted language: we can have any number of c and any number of e. However, suppose that we want to model a recogniser that will accept any of the following:
It may seem that strings of arbitrary letters have nothing to do with English. However, we can soon demonstrate that English includes sentences which are similar in structure. Consider a sentence such as:
We could go on extending this sentence infinitely by simply embedding more and more phrases in the centre of this sentence. (This type of structure is known as centre-embedded.) From this we can conclude that it is impossible to model the entirety of English syntax with a FSTN, or indeed any Finite State Automata.
To put it more briefly: Finite State Automata are mathematically inadequate as a model of English because they can't model centre-embedding.
It is possible to write descriptions of English grammar that include centre-embedded sentences, but this is the subject of another section.
Much reference has been made to the structure of English, but it would be wrong to think that anything in this section is restricted to English. English has no special position in linguistics: it is just another natural language. Linguists find the study of phenomena (such as centre-embedding) across languages interesting, if only because it enables them to make generalisations about the human language capability.
It is easy to see that this network contains repetitions. For instance, the sequence det noun appears three times and the sequence prep det noun appears twice. If we were writing programs (that is writing a description of how to solve a computing problem), then we would procedurize. It is a mark of a poor program that code is repeated instead of being gathered into procedures. Procedurazation in programs is a mark of an elegant program: a program that is devoid of repetitions and avoids long-windedness.
The same kind of subjective criterion can be brought to bear on grammars. This FSTN is inelegant because it is repetitious and long than necessary. If we have an insight into the structure of words that proceed nouns (for instance, that it's also possible to have det adj noun sequences such as the tabby cat), this insight would have to be added to the network wherever there is a noun. If we had been able to procedurize our knowledge, we would only have to make that change once.
The joke about the optimist and the pessimist goes along these lines:
Optimist: This is the best of all possible worlds!
Click here to return to the main text
Ideas of adequacy
We must have some way of evaluating our statements about language. A key word used by linguists is adequacy. We have to be certain we know what we mean by adequacy. My desktop dictionary (an edition of The Concise Oxford dictionary)has the following definition:
The idea of evaluating a theory by adequacy is that the theory should be sufficient to describe or explain the observed phenomenon. This much is obvious: we want a description to cover everything we're trying to describe, otherwise we would be unable to describe some things we know to exist. Think about the words: barely sufficient. The way you think about these words is similar to the joke about the optimist and the pessimist. If you are a pessimist, you'll view being described as "barely sufficient" as a negative way of looking at things: if you are an optimist, you'll see that being "barely sufficient" means that something may have been achieved with the minimum of resources.
Clearly the weakest theory is inadequate, but the strongest and median theories are both adequate. Which is to be preferred? You could argue that the median theory is to be preferred, because it is the barest theory. To put it another way, of the two contenders, it says most about the limitations of the brain. Of course, such a statement would have to be followed up by some rigorous research to find supporting evidence before the theory could be taken as acceptable.
Mathematical and notational adequacy
We could use a variety of views of adequacy to evaluate the syntactic grammars we'll describe in these notes, but we'll restrict ourselves to two views: mathematical adequacy and notational adequacy. These are defined by Gazdar and Mellish (1989) as:
Mathematical adequacy is concerned with whether the formal objects characterized by the notation, under the intended semantics, have the properties manifested in the real-world objects that the notation and its interpretation is intended to model.
Notational adequacy is to do with how elegantly the notation describes the real-world objects that the notation and its interpretation is intended to model.
The Mathematical adequacy of Finite State Automata
We can use Finite State Automata to write syntactic descriptions. The following is a Finite State Transition Network description of part of English. It could account for utterances such as:
but reject any of the following:
To put it more succinctly, our recogniser must only accept input in the form: cªdeª, where a > 0. It is impossible to draw a recogniser that will recognise this language and only this language.
The girl whose mother told me that she'd been painted by Van Gogh at the party shouted.
We can split this into several parts:
The Notational adequacy of Finite State Automata
Consider a FSTN for the sentence the cat sat on the mat by the fire. It might be drawn as:
Summary
Linguists use various kinds of adequacy as criteria with which to evaluate their theories. Of these, mathematical adequacy and notational adequacy were used to evaluate Finite State Automata, which were found to be inadequate because (mathematically) they are unable to describe some of the structures of English (and other natural languages) and because (notationally) they were inelegant.
Pessimist: That's true!
© P.J.Hancox@bham.ac.uk