Mining association rules on Big Data through MapReduce genetic programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4549

  author =       "Francisco Padillo and Jose Maria Luna and 
                 Francisco Herrera and Sebastian Ventura",
  title =        "Mining association rules on Big Data through
                 {MapReduce} genetic programming",
  journal =      "Integrated Computer-Aided Engineering,",
  year =         "2018",
  volume =       "25",
  number =       "1",
  pages =        "31--48",
  keywords =     "genetic algorithms, genetic programming, Association
                 rules, Big Data, MapReduce, Hadoop, Spark",
  ISSN =         "1069-2509",
  publisher =    "IOS Press",
  DOI =          "doi:10.3233/ICA-170555",
  size =         "18 pages",
  abstract =     "Association rule mining is one of the most important
                 tasks to describe raw data. Although many efficient
                 algorithms have been developed to this aim, existing
                 algorithms do not work well on huge volumes of data.
                 The aim of this paper is to propose a new genetic
                 programming algorithm for mining association rules in
                 Big Data. The genetic operators of our proposal have
                 been specifically designed to avoid a growing in the
                 complexity of the solutions without an improvement in
                 their fitness function values. Furthermore, it
                 introduces a repairing operator to improve the
                 convergence. Additionally, to facilitate its
                 application on real world problems a grammar has been
                 included, allowing it to introduce subjective knowledge
                 into the mining process and to reduce the search space.
                 Due to the growing interest in data gathering, a unique
                 implementation of the proposed algorithm is not useful
                 so different implementations (considering different
                 architectures such as RMI, Hadoop and Spark) are
                 required depending on the data size. All these
                 adaptations obtain exactly the same solutions as those
                 of the original algorithm since they only differ on the
                 software architectures. The experimental study
                 considers more than 75 datasets and 14 algorithms and
                 the results reveal that the proposed algorithm obtains
                 excellent results for more than 12 quality measures.
                 The scalability of the proposal is also analysed by
                 considering the three parallel implementations on high
                 dimensional datasets (3,000 millions of instances) and
                 file sizes up to 800 GB.",

Genetic Programming entries for Francisco Padillo Jose Maria Luna Francisco Herrera Sebastian Ventura