Detecting research topics via the correlation between graphs and texts

Created by W.Langdon from gp-bibliography.bib Revision:1.4333

  author =       "Yookyung Jo and Carl Lagoze and C. Lee Giles",
  title =        "Detecting research topics via the correlation between
                 graphs and texts",
  booktitle =    "Proceedings of the 13th ACM SIGKDD International
                 Conference on Knowledge Discovery and Data Mining
  year =         "2007",
  editor =       "Pavel Berkhin and Rich Caruana and Xindong Wu",
  pages =        "370--379",
  address =      "San Jose, California, USA",
  month =        aug # " 12-15",
  publisher =    "ACM",
  keywords =     "genetic algorithms, genetic programming, Algorithms,
                 Languages, Measurement, topic detection, graph mining,
                 probabilistic measure, citation graphs, correlation of
                 text and links",
  isbn13 =       "978-1-59593-609-7",
  bibsource =    "DBLP,",
  DOI =          "doi:10.1145/1281192.1281234",
  size =         "10 pages",
  abstract =     "In this paper we address the problem of detecting
                 topics in large-scale linked document collections.
                 Recently, topic detection has become a very active area
                 of research due to its utility for information
                 navigation, trend analysis, and high-level description
                 of data. We present a unique approach that uses the
                 correlation between the distribution of a term that
                 represents a topic and the link distribution in the
                 citation graph where the nodes are limited to the
                 documents containing the term. This tight coupling
                 between term and graph analysis is distinguished from
                 other approaches such as those that focus on language
                 models. We develop a topic score measure for each term,
                 using the likelihood ratio of binary hypotheses based
                 on a probabilistic description of graph connectivity.
                 Our approach is based on the intuition that if a term
                 is relevant to a topic, the documents containing the
                 term have denser connectivity than a random selection
                 of documents. We extend our algorithm to detect a topic
                 represented by a set of terms, using the intuition that
                 if the co-occurrence of terms represents a new topic,
                 the citation pattern should exhibit the synergistic
                 effect. We test our algorithm on two electronic
                 research literature collections, arXiv and Citeseer.
                 Our evaluation shows that the approach is effective and
                 reveals some novel aspects of topic detection.",
  notes =        "GP literature used as one example",

Genetic Programming entries for Yookyung Jo Carl Lagoze C Lee Giles