Evolving General Term-Weighting Schemes for Information Retrieval: Tests on Larger Collections

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Ronan Cummins and Colm O'Riordan",
  title =        "Evolving General Term-Weighting Schemes for
                 Information Retrieval: Tests on Larger Collections",
  journal =      "Artificial Intelligence Review",
  year =         "2005",
  volume =       "24",
  number =       "3-4",
  pages =        "277--299",
  month =        nov,
  email =        "ronan.cummins@nuigalway.ie",
  keywords =     "genetic algorithms, genetic programming,
                 term-weighting schemes, Information Retrieval",
  ISSN =         "0269-2821",
  DOI =          "doi:10.1007/s10462-005-9001-y",
  abstract =     "Term-weighting schemes are vital to the performance of
                 Information Retrieval models that use term frequency
                 characteristics to determine the relevance of a
                 document. The vector space model is one such model in
                 which the weights assigned to the document terms are of
                 crucial importance to the accuracy of the retrieval
                 system. We describe a genetic programming framework
                 used to automatically determine term-weighting schemes
                 that achieve a high average precision. These schemes
                 are tested on standard test collections and are shown
                 to perform as well as, and often better than, the
                 modern BM25 weighting scheme. We present an analysis of
                 the schemes evolved to explain the increase in
                 performance. Furthermore, we show that the global
                 (collection wide) part of the evolved weighting schemes
                 also increases average precision over idf on larger
                 TREC data. These global weighting schemes are shown to
                 adhere to Luhn's resolving power as middle frequency
                 terms are assigned the highest weight. However, the
                 complete weighting schemes evolved on small collections
                 do not perform as well on large collections. We
                 conclude that in order to evolve improved local
                 (within-document) weighting schemes it is necessary to
                 evolve these on large collections.",
  notes =        "www.kluweronline.com/issn/0269-2821",

Genetic Programming entries for Ronan Cummins Colm O'Riordan