Study on association rule retrieval and association rule-based classification using evolutionary computation

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Guangfei Yang",
  title =        "Study on association rule retrieval and association
                 rule-based classification using evolutionary
  school =       "Waseda University",
  year =         "2009",
  address =      "Japan",
  month =        may,
  keywords =     "genetic algorithms, genetic programming, Genetic
                 Network Programming",
  URL =          "",
  abstract_url = "",
  japanese_url = "",
  URL =          "",
  size =         "146 pages",
  abstract =     "In recent years, data mining becomes more and more
                 important, and association rule mining is one of the
                 most attractive data mining techniques. In the research
                 of association rule mining, it is a remaining unsolved
                 problem to mine interesting association rules, because
                 the conventional algorithms usually give too many

                 This thesis proposes some new methods based on
                 evolutionary computation to mine interesting
                 association rule s .

                 In this thesis, it is argued that the interestingness
                 of association rules is a task-dependent concept, viz:
                 different meanings of interestingness should be defined
                 for different tasks. Without specifying a particular
                 task, it is ambiguous to discuss the interestingness.
                 This thesis studies two tasks: keyword-based
                 association rule retrieval and association rule based
                 classification, which are denoted as TASK I and TASK II
                 , respectively.

                 TASK I is to find interesting association rules for
                 retrieval, by borrowing the keyword-based style of
                 information retrieval to let the human beings search
                 rules. Association rule is a useful tool to analyse the
                 data. However, due to the huge number of association
                 rules, people usually can not find what they are
                 interested in. In information retrieval, keywords are
                 most often used to search interesting information that
                 people are interested in, and in this thesis, the
                 keyword-based style is borrowed to help people to
                 search interesting association rules.",
  abstract =     "TASK II is to find interesting association rules for
                 classification. The association rule based
                 classification is also called associative
                 classification for short, and the accuracy of
                 associative classification methods are even better than
                 conventional classification methods. However, most
                 associative classification methods mine a large number
                 of rules, and how to find interesting rules to increase
                 the classification accuracy further is still a
                 challenging work.

                 Association rule is a well-defined and deterministic
                 concept. Given the minimum support and minimum
                 confidence, any conventional association rule mining
                 method, like A priori, FP-growth, will give the same
                 rules satisfying the thresholds. However, many problems
                 are not deterministic, and it is not always a good
                 choice to solve a non-deterministic problem by a
                 deterministic method. Let's take associative
                 classification 2 for example. Classification is an
                 ill-defined, non-deterministic problem, in the sense
                 that one can not be sure the rules learned from
                 training data will correctly classify the testing data.
                 When applied to classification, the deterministic
                 property of association rules may cause that their
                 abilities are not fully used.

                 Evolutionary computation is a good solution for
                 non-deterministic problems, and it can adapt to
                 different kinds of tasks effectively. Genetic Network
                 Programming (GNP) is a recently developed evolutionary
                 method, which has been applied to various applications
                 successfully. This thesis will discuss how to mine
                 interesting association rules based on the evolutionary
                 computation, especially based on GNP, for TASK I and
                 TASK II.

                 Generally speaking, there are two ways to solve TASK I
                 and TASK II based on evolutionary computation:

                 1. Mine association rules by conventional methods
                 first, and then rank the rules based on evolutionary
                 computation, which places the more interesting rules in
                 the upper positions. This approach is called APPROACH I

                 2. Directly mine a small number of interesting
                 association rules based on evolutionary computation.
                 This kind of approach is called APPROACH II .

                 Chapter 1 introduces the research background, and
                 describes the general ideas of the research in this
                 thesis, including the framework of task-dependent
                 interestingness of association rules, as well as the
                 introductions of TASK I, TASK II, APPROACH I and
                 APPROACH II.

                 Chapter 2 discusses how to solve TASK I by APPROACH I.
                 After the association rules are mined by conventional
                 methods and some keywords are input by a user, a model
                 named RuleRank is proposed to rank the association
                 rules so that the more interesting rules are placed in
                 higher positions. The RuleRank model is built in a
                 supervised learning manner, and then the trained model
                 automatically ranks the rules for the user. From the
                 simulation results, it is found that the proposed
                 method could give satisfactory ranking results for the

                 Chapter 3 discusses how to solve TASK I by APPROACH II.
                 In the method proposed in Chapter 2, a lot of
                 association rules are mined first, however, most of
                 them are uninteresting. In this chapter, a new
                 evolutionary method is 3 proposed to directly mine a
                 small number of interesting association rules. Besides,
                 the dictionary-like style is adopted to give more
                 meaningful annotations to the rules so that the users
                 could have more information to understand the rules
                 better. Some experiments are given to demonstrate the
                 mining of interesting association rules, and the
                 demonstrations show that the proposed method could
                 effectively find the rules which are meaningful,
                 understandable and interesting to the keywords.",
  abstract =     "Chapter 4 discusses how to solve TASK II by APPROACH
                 I. CMAR is one of the representative associative
                 classification methods, and it is found that by
                 adjusting the ranking of rules in CMAR, the
                 classification accuracy could be improved. In this
                 chapter, RuleRank model is applied to obtain the
                 high-quality rankings, and the simulation results show
                 that the RuleRank model could effectively increase the
                 classification accuracy.

                 Chapter 5 discusses how to solve TASK II by APPROACH
                 II. The method proposed in Chapter 4 is based on a
                 large number of rules mined by CMAR, where most of the
                 rules are actually pruned in classification. In this
                 chapter, a new method is proposed to directly mine a
                 few association rules which are interesting for
                 classification. There are three novel ideas discussed
                 in this chapter: attribute-centric approach,
                 record-centric approach and rule-centric approach. The
                 role of attribute-centric approach is to find
                 potentially interesting rules in an efficient manner.
                 The record-centric approach is to reduce the number of
                 wrongly classified or not classified records in the
                 database. The rule-centric is to evaluate the real
                 interestingness of rules, and give award to good rules
                 and penalty to bad rules. The simulation results show
                 that the proposed method has better accuracy than CMAR,
                 and it is faster, too.

                 At last, some conclusions are given to describe the
                 main achievements of this thesis. By applying APPROACH
                 I to TASK I, the user could find interesting
                 association rules which are ranked in a satisfactory
                 order, while by applying APPROACH II to TASK I, the
                 user could directly search a small number of
                 interesting rules which are well annotated. By applying
                 APPROACH I to TASK II, the accuracy of association rule
BibTeX entry too long. Truncated

Genetic Programming entries for Guangfei Yang