Discovering interesting knowledge from a science \& technology database with a genetic algorithm

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

  author =       "Wesley Romao and Alex A. Freitas and 
                 Itana M. {de S. Gimenes}",
  title =        "Discovering interesting knowledge from a science \&
                 technology database with a genetic algorithm",
  journal =      "Applied Soft Computing",
  year =         "2004",
  volume =       "4",
  pages =        "121--137",
  keywords =     "genetic algorithms, genetic programming, data mining,
                 classification, rule interestingness",
  URL =          "",
  DOI =          "doi:10.1016/j.asoc.2003.10.002",
  ISSN =         "1568-4946",
  size =         "17 pages",
  abstract =     "Data mining consists of extracting interesting
                 knowledge from data. This paper addresses the discovery
                 of knowledge in the form of prediction IF-THEN rules,
                 which are a popular form of knowledge representation in
                 data mining. In this context, we propose a genetic
                 algorithm (GA) designed specifically to discover
                 interesting fuzzy prediction rules. The GA searches for
                 prediction rules that are interesting in the sense of
                 being new and surprising for the user. This is done
                 adapting a technique little exploited in the
                 literature, which is based on user-defined general
                 impressions (subjective knowledge). More precisely, a
                 prediction rule is considered interesting (or
                 surprising) to the extent that it represents knowledge
                 that not only was previously unknown by the user but
                 also contradicts his original believes. In addition,
                 the use of fuzzy logic helps to improve the
                 comprehensibility of the rules discovered by the GA.
                 This is due to the use of linguistic terms that are
                 natural for the user. A prototype was implemented and
                 applied to a real-world science & technology database,
                 containing data about the scientific production of
                 researchers. The GA implemented in this prototype was
                 evaluated by comparing it with the J4.8 algorithm, a
                 variant of the well-known C4.5 algorithm. Experiments
                 were carried out to evaluate both the predictive
                 accuracy and the degree of interestingness (or
                 surprisingness) of the rules discovered by both
                 algorithms. The predictive accuracy obtained by the
                 proposed GA was similar to the one obtained by J4.8,
                 but the former, in general, discovered rules with fewer
                 conditions. In addition it works with natural
                 linguistic terms, which leads to the discovery of more
                 comprehensible knowledge. The rules discovered by the
                 proposed GA and the best rules discovered by J4.8 were
                 shown to a user (a University Director) in an interview
                 who evaluated the degree of interestingness
                 (surprisingness) of the rules to him. In general the
                 user considered the rules discovered by the GA much
                 more interesting than the rules discovered by J4.8.",

Genetic Programming entries for Wesley Romao Alex Alves Freitas Itana M de S Gimenes