Discovery of characteristic knowledge in databases using cluster analysis and genetic programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  title =        "Discovery of characteristic knowledge in databases
                 using cluster analysis and genetic programming",
  author =       "Tae-wan Ryu",
  year =         "1998",
  description =  "Degree granted by Dept. of Computer Science.; Thesis
                 (Ph. D.)--University of Houston, 1998.; Includes
                 bibliographical references (leaves 142-150).",
  oai =          "",
  school =       "Department of Computer Science, University of
  address =      "USA",
  month =        dec,
  email =        "",
  keywords =     "genetic algorithms, genetic programming, Computer
                 science, Cluster analysis--Data processing",
  URL =          "",
  size =         "156 pages",
  abstract =     "Knowledge discovery in data (KDD) is the generic
                 approach to analyse and extract useful knowledge from
                 data collections using computerised tools. Applying KDD
                 techniques directly to a database is not
                 straightforward, since in a database, there may be
                 several views of the database depending on the user's
                 interests, unlike the data collections stored in a
                 single flat file format. Moreover, in many cases, there
                 is a data model discrepancy between the target database
                 and the representation format for the input data set
                 that most KDD techniques expect. The presented research
                 centres on developing methodologies, techniques, and
                 tools to discover useful characteristic knowledge in
                 databases. Our approach is first to partition a given
                 database into several clusters with similar properties
                 using cluster analysis, and then to discover
                 characteristic knowledge in each cluster using genetic
                 programming. In this research, we analyzed the problems
                 in clustering databases. We proposed an extended data
                 set format as an input data set format that can store
                 related information unlike a traditional flat file
                 format. We developed an automatic tool that generates
                 an extended data set from databases, which may contain
                 the related information from related tables or classes.
                 We proposed a unified similarity framework that can
                 cope with various kinds of data sets, and generalised
                 clustering algorithms for the proposed similarity
                 framework. We also developed a discovery system that
                 takes the set of data objects in each cluster and
                 discovers characteristic knowledge for the given object
                 set using genetic programming.",
  notes =        " UMI 9917211
                 Supervisor: Christoph F. Eick",

Genetic Programming entries for Tae-Wan Ryu