Linguistic Indicators for Language Understanding: Using Machine Learning Methods to Combine Corpus-Based Indicators for Aspectual Classification of Clauses

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

@PhdThesis{siegel:thesis,
  author =       "Eric Siegel",
  title =        "Linguistic Indicators for Language Understanding:
                 Using Machine Learning Methods to Combine Corpus-Based
                 Indicators for Aspectual Classification of Clauses",
  school =       "Computer Science Department. Columbia University",
  year =         "1998",
  address =      "New York, USA",
  month =        "20 " # may,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "http://www.cs.columbia.edu/~evs/papers/thesis.ps",
  size =         "144 pages",
  abstract =     "Linguistics as a field has provided enormous insights
                 that describe how the thoughts behind language are
                 reflected by the structure of sentences. For example,
                 one writes a paper in one week, but rides a bicycle for
                 one hour. This illustrates how prepositions (in and
                 for) correspond to the type of event. Specifically, in
                 modifies a completed process, while for modifies an
                 ongoing process. The area explored by this thesis is,
                 how can we best put our understanding of linguistics to
                 use in order to tap into the vast knowledge encoded in
                 texts?

                 The ability to distinguish stative clauses, e.g., ``She
                 resembles her mother,'' from event clauses, e.g., ``She
                 ran down the street,'' is a fundamental component of
                 natural language understanding. These two high-level
                 categories correspond to primitive distinctions in many
                 domains, including, for example, the distinctions
                 between diagnosis and procedure in the medical domain.
                 Stativity is the first of three high-level distinctions
                 that compose the aspectual class of a clause. These
                 distinctions in meaning have been well motivated by
                 work in linguistics and natural language
                 understanding.

                 Aspectual classification is a necessary component for
                 applications that perform certain natural language
                 interpretation, natural language generation,
                 summarization, information retrieval, and machine
                 translation tasks. This is because each of these
                 applications requires the ability to reason about
                 time.

                 In this thesis, I develop a system to perform aspectual
                 classification with linguistically-based, numerical
                 indicators. These linguistic indicators make use of an
                 array of aspectual markers, each of which has an
                 associated constraint on aspectual class. For example,
                 only clauses that describe an event can appear with the
                 progressive marker, e.g., ``I was eating breakfast.''
                 Therefore, the category of a verb or phrase is
                 reflected by a numerical indicator that measures how
                 often it occurs in the progressive. The values for such
                 linguistic indicators are computed automatically across
                 corpora of text. We develop and evaluate fourteen
                 indicators over unrestricted sets of verbs occurring
                 across two corpora. Our analysis reveals a predictive
                 value for several indicators that have not previously
                 been conjectured to correlate with aspect in the
                 linguistics literature.

                 Then, machine learning is used to combine multiple
                 indicators in order to improve classification
                 performance. The models automatically derived by
                 learning are manually examined, revealing several
                 linguistic insights regarding the indicators and their
                 interactions. Three machine learning techniques are
                 compared for this task: decision tree induction, a
                 genetic algorithm, and log-linear regression.

                 We conclude that linguistic indicators successfully
                 exploit linguistic insights to provide a much-needed
                 method for aspectual classification. Future work will
                 extend this approach to other semantic distinctions in
                 natural language.",
}

Genetic Programming entries for Eric Siegel

Citations