Created by W.Langdon from gp-bibliography.bib Revision:1.2031
@PhdThesis{siegel:thesis,
author = "Eric Siegel",
title = "Linguistic Indicators for Language Understanding:
Using Machine Learning Methods to Combine Corpus-Based
Indicators for Aspectual Classification of Clauses.",
school = "Computer Science Department. Columbia University",
year = "1998",
keywords = "genetic algorithms, genetic programming",
URL = "
http://www.cs.columbia.edu/~evs/papers/thesis.ps",
size = "pages",
abstract = "Linguistics as a field has provided enormous insights
that describe how the thoughts behind language are
reflected by the structure of sentences. For example,
one writes a paper in one week, but rides a bicycle for
one hour. This illustrates how prepositions (in and
for) correspond to the type of event. Specifically, in
modifies a completed process, while for modifies an
ongoing process. The area explored by this thesis is,
how can we best put our understanding of linguistics to
use in order to tap into the vast knowledge encoded in
texts?
The ability to distinguish stative clauses, e.g., ``She
resembles her mother,'' from event clauses, e.g., ``She
ran down the street,'' is a fundamental component of
natural language understanding. These two high-level
categories correspond to primitive distinctions in many
domains, including, for example, the distinctions
between diagnosis and procedure in the medical domain.
Stativity is the first of three high-level distinctions
that compose the aspectual class of a clause. These
distinctions in meaning have been well motivated by
work in linguistics and natural language
understanding.
Aspectual classification is a necessary component for
applications that perform certain natural language
interpretation, natural language generation,
summarization, information retrieval, and machine
translation tasks. This is because each of these
applications requires the ability to reason about
time.
In this thesis, I develop a system to perform aspectual
classification with linguistically-based, numerical
indicators. These linguistic indicators make use of an
array of aspectual markers, each of which has an
associated constraint on aspectual class. For example,
only clauses that describe an event can appear with the
progressive marker, e.g., ``I was eating breakfast.''
Therefore, the category of a verb or phrase is
reflected by a numerical indicator that measures how
often it occurs in the progressive. The values for such
linguistic indicators are computed automatically across
corpora of text. We develop and evaluate fourteen
indicators over unrestricted sets of verbs occurring
across two corpora. Our analysis reveals a predictive
value for several indicators that have not previously
been conjectured to correlate with aspect in the
linguistics literature.
Then, machine learning is used to combine multiple
indicators in order to improve classification
performance. The models automatically derived by
learning are manually examined, revealing several
linguistic insights regarding the indicators and their
interactions. Three machine learning techniques are
compared for this task: decision tree induction, a
genetic algorithm, and log-linear regression.
We conclude that linguistic indicators successfully
exploit linguistic insights to provide a much-needed
method for aspectual classification. Future work will
extend this approach to other semantic distinctions in
natural language.",
}
Genetic Programming entries for Eric Siegel