Discovering Knowledge from Noisy Databases using Genetic Programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Man Leung Wong and Kwong Sak Leung and 
                 Jack C. Y. Cheng",
  title =        "Discovering Knowledge from Noisy Databases using
                 Genetic Programming",
  journal =      "Journal of the American Society for Information
  year =         "2000",
  volume =       "51",
  pages =        "870--881",
  keywords =     "genetic algorithms, genetic programming, Data mining,
                 Evolutionary Computation, Rule Learning",
  URL =          "",
  abstract =     "In data mining, we emphasise the need for learning
                 from huge, incomplete and imperfect data sets (Fayyad
                 et al. 1996, Frawley et al. 1991, Piatetsky-Shapiro and
                 Frawley, 1991). To handle noise in the problem domain,
                 existing learning systems avoid overfitting the
                 imperfect training examples by excluding insignificant
                 patterns. The problem is that these systems use a
                 limiting attribute-value language for representing the
                 training examples and the induced knowledge. Moreover,
                 some important patterns are ignored because they are
                 statistically insignificant. In this paper, we present
                 a framework that combines Genetic Programming (Koza
                 1992; 1994) and Inductive Logic Programming (Muggleton,
                 1992) to induce knowledge represented in various
                 knowledge representation formalisms from noisy
                 databases. The framework is based on a formalism of
                 logic grammars and it can specify the search space
                 declaratively. An implementation of the framework,
                 LOGENPRO (The Logic grammar based GENetic PROgramming
                 system), has been developed. The performance of
                 LOGENPRO is evaluated on the chess endgame domain. We
                 compare LOGENPRO with FOIL and other learning systems
                 in detail and find its performance is significantly
                 better than that of the others. This result indicates
                 that the Darwinian principle of natural selection is a
                 plausible noise handling method which can avoid
                 overfitting and identify important patterns at the same
                 time. Moreover, the system is applied to one real-life
                 medical database. The knowledge discovered provides
                 insights to and allows better understanding of the
                 medical domains.",

Genetic Programming entries for Man Leung Wong Kwong-Sak Leung Jack Chun-yiu Cheng