Concise Pattern Learning for RDF Data Sets Interlinking

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  title =        "Concise Pattern Learning for {RDF} Data Sets
  titletranslation = "Apprentissage de Motifs Concis pour le Liage de
                 Donnees RDF",
  author =       "Zhengjie Fan",
  year =         "2014",
  school =       "Universite de Grenoble",
  address =      "France",
  month =        "7 " # aug,
  keywords =     "genetic algorithms, genetic programming, interlinking,
                 ontology matching, machine learning",
  annote =       "Computer mediated exchange of structured knowledge
                 (EXMO) ; Inria Grenoble - Rh{\^o}ne-Alpes ; INRIA -
                 INRIA - Laboratoire d'Informatique de Grenoble (LIG) ;
                 CNRS - Universit{\'e} Pierre Mend{\`e}s France
                 (Grenoble 2 UPMF) - Institut National Polytechnique de
                 Grenoble (INPG) - Universit{\'e} Joseph Fourier
                 (Grenoble 1 UJF) - CNRS - Universit{\'e} Pierre
                 Mend{\`e}s France (Grenoble 2 UPMF) - Institut National
                 Polytechnique de Grenoble (INPG) - Universit{\'e}
                 Joseph Fourier (Grenoble 1 UJF); Universit{\'e} de
                 Grenoble; J{\'e}r{\^o}me
  bibsource =    "OAI-PMH server at",
  contributor =  "Computer mediated exchange of structured knowledge and
                 J{\'e}r{\^o}me Euzenat and Datalift",
  identifier =   "tel-00986104",
  language =     "english",
  oai =          "oai:HAL:tel-00986104v1",
  rights =       "info:eu-repo/semantics/openAccess",
  type =         "info:eu-repo/semantics/doctoralThesis; Theses",
  URL =          "",
  URL =          "",
  URL =          "",
  size =         "169 pages",
  abstract =     "There are many data sets being published on the web
                 with Semantic Web technology. The data sets contain
                 analogous data which represent the same resources in
                 the world. If these data sets are linked together by
                 correctly building links, users can conveniently query
                 data through a uniform interface, as if they are
                 querying one data set. However, finding correct links
                 is very challenging because there are many instances to
                 compare. Many existing solutions have been proposed for
                 this problem. (1) One straight-forward idea is to
                 compare the attribute values of instances for
                 identifying links, yet it is impossible to compare all
                 possible pairs of attribute values. (2) Another common
                 strategy is to compare instances according to attribute
                 correspondences found by instance-based ontology
                 matching, which can generate attribute correspondences
                 based on instances. However, it is hard to identify the
                 same instances across data sets because there are the
                 same instances whose attribute values of some attribute
                 correspondences are not equal. (3) Many existing
                 solutions leverage Genetic Programming to construct
                 interlinking patterns for comparing instances, while
                 they suffer from long running time. In this thesis, an
                 interlinking method is proposed to interlink the same
                 instances across different data sets, based on both
                 statistical learning and symbolic learning. The input
                 is two data sets, class correspondences across the two
                 data sets and a set of sample links that are assessed
                 by users as either positive or negative. The method
                 builds a classifier that distinguishes correct links
                 and incorrect links across two RDF data sets with the
                 set of assessed sample links. The classifier is
                 composed of attribute correspondences across
                 corresponding classes of two data sets, which help
                 compare instances and build links. The classifier is
                 called an interlinking pattern in this thesis. On the
                 one hand, our method discovers potential attribute
                 correspondences of each class correspondence via a
                 statistical learning method, the K-medoids clustering
                 algorithm, with instance value statistics. On the other
                 hand, our solution builds the interlinking pattern by a
                 symbolic learning method, Version Space, with all
                 discovered potential attribute correspondences and the
                 set of assessed sample links. Our method can fulfill
                 the interlinking task that does not have a conjunctive
                 interlinking pattern that covers all assessed correct
                 links with a concise format. Experiments confirm that
                 our interlinking method with only 1percent of sample
                 links already reaches a high F-measure (around
                 0.94-0.99). The F-measure quickly converges, being
                 improved by nearly 10percent than other approaches.",

Genetic Programming entries for Zhengjie Fan