University of Birmingham Computer Science
Teaching Support Material
Natural Language Processing & Applications
2007/08
Dr Peter Coxhead

Contents: [ Syllabus | Assessment | Support Material | Other Links ]

Syllabus

There is an official syllabus for 06-11223 Natural Language Processing & Applications, but note that I may not cover it all in any particular year.

Assessment

Introduction

Until 2005/06, the module had no continuous assessment. Instead, an essay topic was given to candidates in advance and formed the first optional question of the five questions, of which three were to be answered in 2 hrs. From 2005/06 onwards, the paper will contain only four slightly shorter questions of which three are to be answered in 1.5 hrs. Question 1 has been replaced by continuous assessment, worth 20%.

Examination Papers and Reports

2000/01 paper
2001/02 paper
2002/03 paper; Report
2003/04 Paper; Report
2004/05 Paper
2005/06 Paper; Report
2006/07 Paper; Report
2007/08 Paper; Report

(NB the above papers are as they were when I handed them to the School Office; minor changes might have been made afterwards.)

I have also collected together some relevant questions from Aston 1997-1999 examination papers (PDF format), where I used to teach this topic. Note that you should ignore all references to Prolog -- this is not covered at Birmingham.

Support Material

Confused about verbs and nouns? Not sure about cases and genders? Consult the NLP Glossary.

You may be interested in my Introduction to Natural Language Processing [PDF] which has been used in the past as part of introductory AI modules.

The table below lists other support material available for NLPA and the section of the module to which it refers. The links are to PDF documents. The answers to exercises will be available only towards the end of the module.

Topic Subtopic Support Material
Introduction NLPA-Intro
Sound Systems NLPA-Phon1, NLPA-Phon2, NLPA-Ans
Grammar Morphology NLPA-Morph, NLPA-Ans
Syntax NLPA-Syntax, NLPA-Syntax-EngJap, NLPA-Syntax-Ex2a-Ans, NLPA-Ans
Meaning NLPA-Meaning, NLPA-Ans
PostscriptNLPA-Postscript-OH
CourseworkNLPA-CA-07

See also NLP Interactive. This web site was originally prepared by Dean Parker, a student from Aston University who graduated in 2000, as his final year project. It consists of online interactive tests and exercises. I have checked most of this material -- please tell me of any errors you find.

Other Links

Online information can change rapidly! Please e-mail me if you find any other relevant links or if you find any of these links broken.

Peter Hancox has a web site for the NLP1 module, which may be of interest, particularly for more detail on morphology and syntax.

There's a massive amount of information about natural languages available via the iLoveLanguages web page. (Some links are eccentric, to say the least.) Another general site for languages is that of UniLang.

Speech

The technology used at Birmingham New Street station is based on "canned text". There's some information about it in an article from Railnews.

There's an interesting discussion of speech synthesis in the article "Conversational Computers", Scientific American, June 2005, pp.40-45 (but remember the phonemes are SAE not SEE!). It's available online, but full access is restricted.

The way in which people react to synthesized voices is surveyed in an interesting 2007 New Scientist article [restricted access]. For example, male drivers were found to not to trust a female voice giving directions.

A company called SitePal has an excellent web site promoting their speech synthesis software. As of 22 Jan 2007, to try speech synthesis in different languages, you can go straight to http://www.oddcast.com/home/demos/tts/frameset.php?frame1=sptalk (otherwise accessed from "Via Text-to-Speech" on http://www.sitepal.com/how). The quality of the synthesized speech varies by the voice chosen (e.g. the consensus of student native speakers is that Lily produces the best Chinese overall; Afroditi is much better for Greek than Artemis). The 'UK' English voices are very clearly distinct from the American ones.

You can also experiment with speech synthesis on the web using the Say... page.

Following the '.com collapse' in the early 2000s, a number of very well-known companies involved in practical NLP disappeared, including Dragon Systems and Lernout & Hauspie. Many were bought out by large companies, such as Microsoft or IBM. A consequence of the changes seems to have been a noticeable decrease in the availability of information relevant to NLP online.

Dragon Systems is generally thought to have been the major force in the development of practical, PC-based speech recognition (although this was only made possible by years of previous research, both in universities and in other companies, such as IBM). Researchers who had previously expressed public scepticism about the possibility of carrying out real-time recognition of continuous speech on a desktop machine were rather caught out when Dragon produced a program which did just that! Dragon Systems was taken over by its rival L&H (=Lernout & Hauspie), who were perhaps better known for machine translation software. L&H then went under, and the speech technology was sold to ScanSoft, now apparently part of Nuance. Nuance also supply IBM's ViaVoice software.

Dragon's founder, Jim Baker, set up a company called Novauris, to develop "speech recognition technology that can make a significant practical difference to the world" [according to their web site in June 2004].

See also an article entitled "It pays to talk" from the The Computer Bulletin.

Microsoft was originally reported to have been integrating a "natural user interface" (NUI) into the Vista version of its Windows operating system (then code-named Longhorn). The NUI (originally code-named Aero) was expected to include speech recognition and other aspects of NLP. Based on web sources, it appears to me that Microsoft backed off from some of its original aims. For those interested in computers and speech, the Microsoft Speech Site is worth watching.

Machine translation

MT is available online via a number of sites:

To check the accuracy of translations, either you need someone who knows the relevant language(s), or you can try putting the output of an English to X translator back through an X to English translator. Is one of these systems better than the others? Or are they about the same? (For German to English I think Altavista is better, but what about other language pairs?)

Information about currently available machine translation software is probably best obtained via a Google search.

Multilingo is a company offering online translation services.