Intelligent Fusion of Evidence from Multiple Sources for Text Classification

Created by W.Langdon from gp-bibliography.bib Revision:1.4216

  title =        "Intelligent Fusion of Evidence from Multiple Sources
                 for Text Classification",
  author =       "Baoping Zhang",
  year =         "2006",
  month =        sep # "~06",
  school =       "Virginia Polytechnic Institute and State University",
  type =         "Doctor of Philosophy in Computer Science and
  address =      "USA",
  bibsource =    "OAI-PMH server at",
  contributor =  "Dan Spitzner and Chang-Tien Lu and Edward A. Fox and
                 Weiguo Fan and P{\'a}vel Calado",
  language =     "en",
  oai =          "oai:VTETD:etd-07032006-152103",
  rights =       "unrestricted; I hereby certify that, if appropriate, I
                 have obtained and attached hereto a written permission
                 statement from the owner(s) of each third party
                 copyrighted matter to be included in my thesis,
                 dissertation, or project report, allowing distribution
                 as specified below. I certify that the version I
                 submitted is the same as that approved by my advisory
                 committee. I hereby grant to Virginia Tech or its
                 agents the non-exclusive license to archive and make
                 accessible, under the conditions specified below, my
                 thesis, dissertation, or project report in whole or in
                 part in all forms of media, now or hereafter known. I
                 retain all other ownership rights to the copyright of
                 the thesis, dissertation or project report. I also
                 retain the right to use in future works (such as
                 articles or books) all or part of this thesis,
                 dissertation, or project report.",
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  URL =          "",
  size =         "146 pages",
  abstract =     "Automatic text classification using current approaches
                 is known to perform poorly when documents are noisy or
                 when limited amounts of textual content is available.
                 Yet, many users need access to such documents, which
                 are found in large numbers in digital libraries and in
                 the WWW. If documents are not classified, they are
                 difficult to find when browsing. Further, searching
                 precision suffers when categories cannot be checked,
                 since many documents may be retrieved that would fail
                 to meet category constraints. In this work, we study
                 how different types of evidence from multiple sources
                 can be intelligently fused to improve classification of
                 text documents into predefined categories. We present a
                 classification framework based on an inductive learning
                 method -- Genetic Programming (GP) -- to fuse evidence
                 from multiple sources. We show that good classification
                 is possible with documents which are noisy or which
                 have small amounts of text (e.g., short metadata
                 records) -- if multiple sources of evidence are fused
                 in an intelligent way. The framework is validated
                 through experiments performed on documents in two
                 testbeds. One is the ACM Digital Library (using a
                 subset available in connection with CITIDEL, part of
                 NSF's National Science Digital Library). The other is
                 Web data, in particular that portion associated with
                 the Cad{\^e} Web directory. Our studies have shown that
                 improvement can be achieved relative to other machine
                 learning approaches if genetic programming methods are
                 combined with classifiers such as kNN. Extensive
                 analysis was performed to study the results generated
                 through the GP-based fusion approach and to understand
                 key factors that promote good classification.",
  notes =        "URN etd-07032006-152103",

Genetic Programming entries for Baoping Zhang