A Detection of Duplicate Records from Multiple Web Databases using pattern matching in UDD

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Dewendra Bharambe and Susheel Jain and Anurag Jain",
  title =        "A Detection of Duplicate Records from Multiple Web
                 Databases using pattern matching in UDD",
  journal =      "International Journal of Emerging Technology and
                 Advanced Engineering",
  year =         "2013",
  volume =       "3",
  number =       "5",
  pages =        "412--417",
  month =        may,
  keywords =     "genetic algorithms, genetic programming, data
                 deduplication, UDD, SVM, WCSS, genetic algorithm,
                 pattern matching",
  ISSN =         "2250--2459",
  annote =       "The Pennsylvania State University CiteSeerX Archives",
  bibsource =    "OAI-PMH server at citeseerx.ist.psu.edu",
  language =     "en",
  oai =          "oai:CiteSeerX.psu:",
  rights =       "Metadata may be used without restrictions as long as
                 the oai identifier remains attached to it.",
  URL =          "http://citeseerx.ist.psu.edu/viewdoc/summary?doi=",
  URL =          "http://www.ijetae.com/files/Volume3Issue5/IJETAE_0513_68.pdf",
  URL =          "http://www.ijetae.com/Volume3Issue5.html",
  abstract =     "Record matching refers to the task of finding entries
                 that refer to the same entity in two or more files, is
                 a vital process in data integration. Most of the
                 supervised record matching methods require training
                 data provided by users. Such methods can not apply for
                 web database scenario, where query results dynamically
                 generated. In existing system, an unsupervised record
                 matching method effectively identifies the duplicates
                 from query result records of multiple web databases by
                 identifying the duplicate and non duplicate set in the
                 source and from that non duplicate set again searches
                 for the existence of duplication. Then use two
                 co-operative classifiers from the non duplicate set,
                 they are Weighted Component Similarity Summing (WCSS)
                 Classifier and Support Vector Machine (SVM) classifier.
                 These two classifiers can be used to identify the query
                 results iteratively from multiple web databases. In
                 this paper we modify record matching algorithm with
                 genetic algorithm. The genetic programming is time
                 consuming so we proposed UDD with genetic programming.
                 A performance evaluation for accuracy is done for the
                 dataset with duplicates using UDD and UDD with Genetic
  notes =        "Article 68.",

Genetic Programming entries for Dewendra Onkar Bharambe Susheel Jain Anurag Jain