The Evolution and Analysis of Term-Weighting Schemes in Information Retrieval

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Ronan Cummins",
  title =        "The Evolution and Analysis of Term-Weighting Schemes
                 in Information Retrieval",
  school =       "National University of Ireland, Galway",
  year =         "2008",
  month =        may,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  size =         "201 pages",
  abstract =     "Information Retrieval is concerned with the return of
                 relevant documents from a document collection given a
                 user query. Term-weighting schemes assign weights to
                 keywords (terms) based on how useful they are likely to
                 be in identifying the topic of a document and are one
                 of the most crucial aspects in relation to the
                 performance of Information Retrieval systems. Much
                 research has focused on developing both term-weighting
                 schemes and theories to support them.

                 Genetic Programming is a biologically-inspired search
                 algorithm useful for searching large complex search
                 spaces. It uses a Darwinian-inspired survival of the
                 fittest approach to search for solutions of a suitable
                 fitness. This thesis outlines experiments that use
                 Genetic Programming to search for term-weighting
                 schemes. A study of term-weighting schemes in the
                 literature is undertaken and consequently, the function
                 space is separated into three areas that represent
                 three fundamental concepts in term

                 Experiments using Genetic Programming to search these
                 three function spaces show that term-weighting schemes
                 that outperform state of the art term-weighting
                 benchmarks can be found. These experiments also show
                 that the new term-weighting schemes have general
                 properties as they achieve high performance on unseen
                 test data. An analysis of the solution space of the
                 term-weighting schemes shows that the evolved solutions
                 exist in a different part of the space than the current
                 benchmarks. These experiments show that the Genetic
                 Programming approach consistently evolves solutions
                 that return similar ranked lists in each of the three
                 function spaces.

                 Furthermore, the best performing term-weighting schemes
                 are formally analysed and are shown to satisfy a number
                 of axioms in Information Retrieval. A detailed analysis
                 of the existing axioms is presented together with some
                 amendments and additions to the existing axioms. This
                 analysis aids in theoretically validating the
                 term-weighting schemes evolved in the

                 Finally, a secondary application of Genetic Programming
                 to Information Retrieval is presented to show the
                 potential for Genetic Programming in addressing other
                 issues in Information Retrieval. This experiment shows
                 that Genetic Programming can be used to combine further
                 evidence in the retrieval process to enhance
                 performance. This approach evolves schemes for use with
                 two automatic query expansion techniques to increase
                 retrieval effectiveness.",
  notes =        "Supervisor: Colm O'Riordan",

Genetic Programming entries for Ronan Cummins