A brief history of Natural Language Processing

NLP is incredibly old. The use of the computer for calculating artillery tables and code-breaking was less pressing for a few years after 1945. Peace time gave researchers the opportunity to allow their imagination roam over new applications. Until about 1960, it was quite feasible to write definitive histories of NLP, with reviews of any and all significant work. However, since then, there has been so much research, that it is no longer reasonable to write exhaustive histories. However, it is possible to pick out the most influential systems and the trends that emerged, and this is what this section does.

The prehistory of NLP

Proposals for mechanical translators of languages pre-date the invention of the digital computer. The first recognisable NLP application was a dictionary look-up system developed at Birkbeck College, London in 1948. American interest is generally dated to a memorandum written by Warren Weaver in 1949. Weaver had been involved in code-breaking during the Second World War. His idea was simple: given that humans of all nations are much the same (inspite of speaking a variety of languages), a document in one language could be viewed as having been written in code. Once this code was broken, it would be possible to output the document in another language. From this point of view, German was English in code.

As a research idea, this caught on quickly, with significant machine translation research groups set up in the USA, UK, France and the Soviet Union. Early American systems concentrated on the translation of German to English, because there were technical documents left over from the war to be translated. As time passed and the German material was seen as out-dated, the pairs of languages in the research projects moved to Russian to English, Russian to French or English to Russian and French to Russian. The Cold War had caught up with machine translation research.

Early machine translation systems were conspicuously unsuccessful. Even worse, they eventually brought the hostility of research funding agencies down upon the heads of NLP workers. Warren Weaver's memo of 1949 had inspired many projects, all of which were breaking new ground: there was no received wisdom in NLP, no body of knowledge and techniques to be applied. The early workers were often mathematicians who struggled with the primitive computing machinery. Some early workers were bilingual, eg native German speakers who had emigrated to the USA. Their knowledge of both languages in their system suggested that they would be able to write programs that would satisfactorily translate technical texts, at least. It soon became apparent that the task they had set themselves was extremely difficult. Language was far more complex than they had imagined. Worse still, although they were fluent speakers of their native language, it proved very difficult to encode their knowledge of the language in a computer program.

The obvious place to look for help was from Linguistics. The literature of the 1950s shows a growing awareness of work in mainstream Linguistics, and it became something of a trend for young researchers in Linguistics to join Machine Translation teams. While an openness to the contribution of related disciplines was to be welcomed, it is unclear that it helped Machine Translation a great deal, because there just were not suitable linguistic theories in existence. This changed in 1957 with the publication of Syntactic Structures by the young American Linguist who has dominated theoretical linguistics ever since, Noam Chomsky. Chomsky has revolutionised linguistics, perhaps almost single-handed. He introduced the idea of Generative Grammar: rule-based descriptions of syntactic structures. Although many have disagreed with Chomsky's ideas, producing alternative linguistic formalisms or taking issue with his methods of discovering linguistic data, almost all work in NLP since 1957 has been marked by his influence.

Early machine translation researchers realised that their systems could not translate the input texts without further assistance. Given the paucity of Linguistic theories, especially before 1957, some people proposed that texts should be pre-edited so as to mark difficulties in the text, for instance to disambiguate ambiguous words (for instance, in English, "ball"). As Machine Translation systems couldn't produce fluent output, the "target" language would have to be edited to polish it into a comprehensible text.

The introduction of pre and post-editing of machine translated texts had introduced the idea that the computer could be used as a tool to assist the human in tasks which were still too difficult for the computer to achieve on its own. In assisted machine translation, the computer acts as the memory, relieving the human of the need to know vast amount of vocabulary. Bar-Hillel reviewed the field and concluded that Fully-Automatic High-Quality Translation (FAHQT) is impossible without knowledge. He reviewed the then current projects and concluded that the methods they used, which in essence shuffled pairs of words, were inherently doomed to fail, even if extended significantly. The reason was simple: human translators add their understanding of the document to be translated to their knowledge of the structures of the languages they are working with. There remain some constructions that just require an understanding of the document or the way the world is for them to be correctly translated. In a language like English, it is difficult to know what the speaker of a sentence like:

"She wore small shoes and socks."

intended. (Were the socks also small?) For many purposes it doesn't matter, but if the system were analysing witness statements to initiate a search, it could be crucial.

Bar-Hillel's comments have had a long-lasting influence on the perception of the practicality of NLP and Machine Translation in particular. The other damning factor was the over-selling of systems. Research projects have to secure long-term funding in order to keep research teams together. In a situation where there are many teams working on the same basic area, it is crucial to be seen to be making good progress. Sponsors like to see clear practical demonstrations of the results of their funding. Machine Translation suffered from over selling itself up until the mid-1960s. This was not helped by the willingness of some of the press to put an (perhaps naive) optimistic gloss on any development. For Machine Translation, the demonstration of the Georgetown system on 7th January 1955 was just such an occasion. Looking back on this system with the hindsight of forty years, it seems an incredibly crude system that never had a hope of translating any by the most carefully chosen texts. At the time, it was greeted as the advent of practical Machine Translation.

US funding of Machine Translation research was reckoned to have cost the public purse $20 million by the mid 1960s. The Automatic Language Processing Advisory Committee (ALPAC) produced a report on the results of the funding and concluded that "there had been no machine translation of general scientific text, and none is in immediate prospect". US funding for machine translation was stopped, and this had the effect of halting most associated work in non-Machine Translation NLP. This had the knock-on effect of halting funding in other countries and NLP entered something of a dormant phase.

NLP from 1966 to 1980

Some histories suggest that NLP virtually disappeared from the scene after the ALPAC report. This is a romantic view which is not entirely supported by evidence. It is certainly true that there was much less NLP work, and that Machine Translation research substantially decreased for ten years of more. However, there were some significant developments and systems in the fifteen years after the ALPAC report, some of which are still influential today.

The key developments were:

Augmented Transition Networks
The Augmented Transition Network (ATN) is a piece of searching software that is capable of using very powerful grammars to process syntax. It would be wrong to think of it as just a syntax processor, because it is more than just a search algorithm. It provided a formalism for expressing knowledge about the domain of application. (The knowledge is written in the form of extended transition networks.) Along side was a specification of a way of using these networks to search for solutions to problems. In the case of NLP, the knowledge could be about the syntax of English sentences and the problems could be to produce parses of English sentences. The domain of application could be something completely different, for instance planning the movements of a robot in a warehouse.

Case Grammar
Case Grammar is an attractive account of an aspect of semantics. Languages such as English express the relationship between verbs and nouns mainly by the use of linking prepositions. Consider the following sentence:

John bought a ticket for Mary in the Symphony Hall Booking Office.

We know from the position of the words John and ticket that John is the agent instigating the action and that the ticket is the patient (or object) of the action. We know that Mary is the beneficiary of the action because of the use of the preposition for before her name. (What would have been the meaning of the sentence if that preposition had been from?) The location of the action was the Symphony Hall Booking Office, as is indicated by the use of the preposition in.

Charles Fillmore noticed that some languages donÕt have prepositions, but can still encode the same kinds of meaning. An analysis showed that they used different methods to express this information, for instance, the use of differing word endings (grammatical case) to indicate the role that a noun was playing in relation to a verb. (In English, we have the remnant of this method in the possessive: "John's book", where the ending of John is changed to show the role its playing in the sentence.) Other languages used rigid word order. Fillmore proposed that there are a very small number of "deep cases" which represent the possible relations between a verb and its nouns. Individual languages express this deep cases in a variety of ways, such as word order, prepositions, word inflection (ie changing the endings of words).

The significance of the proposal for NLP is that it contributed a relatively easily implementable theory which could contribute much semantic information with little processing effort. It also contributed to the solution of one of the intractable problems of Machine Translation: the translation of prepositions.

Semantic representations
There were several significant developments in semantic processing. Schank and his workers introduced the notion of Conceptual Dependency, a method of expressing language in terms of semantic primitives. Systems were written which included no syntactic processing. QuillianÕs work on memory introduced the idea of the semantic network, which has been used in varying forms for knowledge representation in many systems. William Woods used the idea of procedural semantics to act as an intermediate representation between a language processing system and a database system.

The key systems were:

Terry Winograd's SHRDLU system simulated a robot which manipulated blocks on a table top. It could handle instructions such as "Pick up the red pyramid" and answer questions like "What does the blue box contain?". The importance of SHRDLU is that it shows that syntax, semantics and reasoning about the world can be combined to produce a system that understands natural language. It was a very limited system: it could handle only quite a restricted range of sentences. More importantly, it could only understand language relating to a minutely small part of the whole world: the world of blocks. Its power came from its very limited domain and any attempt to scale the system up would result in increasingly less effective systems.

LUNAR was a database interface system that used ATNs and Woods' Procedural Semantics. It draws its name from the database used, which consisted of information about the lunar rock samples. The system was informally demonstrated at the Second Annual Lunar Science Conference in 1971. Its performance was quite impressive: it managed to handle 78% of requests without error, a figure that rose to 90% when dictionary errors were corrected. This figure is misleading because the system wasn't subject to intensive use. A scientist who used it to extract information for everyday work would soon have found that he wanted to make requests beyond the linguistic ability of the system.

LIFER/LADDER is one of the most impressive of NLP systems. It was designed as a natural language interface to a database of information about US Navy ships. It used a semantic grammar (that is, it used labels such as "SHIP" and "ATTRIBUTE" rather than syntactic labels such as noun and verb). This means that it was closely tied to the domain for which it was engineered in a similar way to SHRDLU. However, the developers of the system used the semantic grammar to advantage by building various user-friendly features on to of the grammar. These included the ability to define new dictionary entries, to define paraphrases (for instance to make short-cuts possible) and to process incomplete input. These features are themselves very impressive, but the research team embarked on a rigorous programme of evaluation. The published reports make rewarding reading for anyone studying NLP in depth. One of the main findings was that humans quickly adapt to the machine and attempt to use very incomplete sentences, even to the point of entering input that looks rather like informal database query language statements.

It is invidious to select such a very small number of systems. Nonetheless, the systems chosen reflect major achievements. If it is possible to draw one conclusion from this phase of the NLP research effort, I would say that we could conclude that applications required semantic knowledge in such large amounts as to make the development of practically useful systems a remote possibility. Some of the developments since 1980 can be seen as attempts to get round the semantic information bottleneck.

© P.J.Hancox@bham.ac.uk