I am happy to supervise any AI/natural language processing project. In particular, I am currently very interested in the use of corpora (large bodies of text) and empirical methods to develop robust techniques and systems. The following are a few concrete projects i have in mind but I am open to other ideas.
  • Affect & emotion detection in Text
    Certain keywords can be associated with different emotions. For example words such as "good", "kind" and "happy" are useful indicators that a newspaper story has some positive connotation. I'd like to supervise a project which uses statistical methods to rate newspaper stories (or movie reviews). One interesting application would be a measure of subjectivity versus objectivity in newspaper reporting.
  • Text Summarisation Techniques
    More and more newspapers now publish their pages on the web. However, often users are too busy to spend the time reading a newspaper. What is required is a summarisation agent which fetches news from the internet and then summarizes it for the user.
  • Question Answering
    QA is the problem of automatically providing a single answer given a question & a collection of documents (or potentially the internet). There's many obvious commercial applications to QA technologies and QA has been heavily evaluated via research competitions. I'm very interested in projects which exploit QA techniques to provide useful applications.

A number of NLP toolkits exist which (hopefully) allow for more interesting projects than building a parser from scratch (though that's not a bad idea either). In particular, I'm interested in using the Python OpenSource NLTK. Of course, this would require you learn some Python.

I'm also interested in supervising projects which attempt current NLP industrial challenges.

For example, http://www.mitre.org/work/challenge/

(quoted from their site)
"The current Challenge, the first in a series, entails multicultural name matching—a technology that is a key component of identity matching, which involves measuring the similarity of database records referring to people. Uses include verifying eligibility for Social Security or medical benefits, identifying and reunifying families in disaster relief operations, vetting persons against a travel watchlist, and merging or eliminating duplicate records in databases. Person name matching can also be used to improve the accuracy and speed of document searches, social network analysis, and other tasks in which the same person might be referred to by multiple versions or spellings of a name.

The task is to match a query file and an index file, each containing a list of names, against one another and produce a list of scored matches for each query name. Participants will receive a dataset and task guidelines, submit responses, and receive immediate feedback on their performance."

This challenge will probably complete before summer. However, hopefully the data will available (and plenty of existing systems to compare against will by then exist). Of course, this is the first of a series of challenges so a project in this area could be very interesting.

There's also plenty of other challenges available and it would be nice to see how far we could get in a student project.