A Genetic Programming Slant on the Way to Record De-Duplication in Repositories

Created by W.Langdon from gp-bibliography.bib Revision:1.4208

  author =       "S. Preethy and A. Daniel Das",
  title =        "A Genetic Programming Slant on the Way to Record
                 De-Duplication in Repositories",
  journal =      "IJIET",
  year =         "2013",
  volume =       "2",
  number =       "2",
  pages =        "60--64",
  month =        apr,
  keywords =     "genetic algorithms, genetic programming,
                 de-duplication, computation",
  annote =       "The Pennsylvania State University CiteSeerX Archives",
  bibsource =    "OAI-PMH server at citeseerx.ist.psu.edu",
  language =     "en",
  oai =          "oai:CiteSeerX.psu:",
  rights =       "Metadata may be used without restrictions as long as
                 the oai identifier remains attached to it.",
  URL =          "http://citeseerx.ist.psu.edu/viewdoc/summary?doi=",
  URL =          "http://ijiet.com/wp-content/uploads/2013/05/10.pdf",
  ISSN =         "2319-1058",
  URL =          "http://ijiet.com/issues/volume-2-issue-2-april-2013/",
  size =         "5 pages",
  abstract =     "Several systems that rely on consistent data to offer
                 high-quality services, such as digital libraries and
                 e-commerce brokers, may be affected by the existence of
                 duplicates, quasi replicas, or near-duplicate entries
                 in their repositories. Because of that, there have been
                 significant investments from private and government
                 organisations for developing methods for removing
                 replicas from its data repositories. This is due to the
                 fact that clean and replica-free repositories not only
                 allow the retrieval of higher quality information but
                 also lead to more concise data and to potential savings
                 in computational time and resources to process this
                 data. In this paper, we propose a genetic programming
                 approach to record de- duplication that combines
                 several different pieces of evidence extracted from the
                 data content to find a de-duplication function that is
                 able to identify whether two entries in a repository
                 are replicas or not. As shown by our experiments, our
                 approach outperforms an existing state-of-the-art
                 method found in the literature. Moreover, the suggested
                 functions are computationally less demanding since they
                 use fewer evidence. In addition, our genetic
                 programming approach is capable of automatically
                 adapting these functions to a given fixed replica
                 identification boundary, freeing the user from the
                 burden of having to choose and tune this parameter.",
  notes =        "Department of Information Technology N.P.R.College of
                 Engineering and Technology, Dindigul, Tamilnadu,

                 Department of Mechanical Engineering N.P.R.College of
                 Engineering and Technology, Dindigul, Tamilnadu,

Genetic Programming entries for S Preethy A Daniel Das