Classification Algorithms for Big Data over distributed processing frameworks

Created by W.Langdon from gp-bibliography.bib Revision:1.4340

  author =       "Armando Segatori",
  title =        "Classification Algorithms for Big Data over
                 distributed processing frameworks",
  school =       "Pisa University",
  year =         "2016",
  address =      "Italy",
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  abstract =     "Classification problems have been widely studied in
                 the context of data mining and different approaches to
                 address these problems have been developed in the last
                 decades. Among them, associative classification and
                 decision trees have proved to be very effective and
                 have been successfully employed in several application
                 domains. Furthermore, some of these approaches have
                 integrated the fuzzy set theory with the objective of
                 dealing with uncertain and noise data. Unfortunately,
                 most of the approaches proposed up to now have been
                 designed for maximizing accuracy, often neglecting the
                 complexity both in terms of memory that execution
                 times. Thus, these approaches are generally not able to
                 handle adequately the so-called ``big data''. In this
                 Ph.D. thesis, we propose different solutions in a
                 distributed environment for generating accurate and
                 interpretable classification models for big data. In
                 particular, we focus on associative classification and
                 decision trees, integrating our solutions with fuzzy
                 set theory. Since the generation of such models
                 requires that continuous features are discretized, we
                 also propose a novel distributed discretization
                 approach based on information entropy. This approach
                 has been therefore extended with fuzzy logic for
                 generating fuzzy partitions. Finally, considering the
                 complexity of the models generated by previous
                 solutions, we propose a distributed evolutionary
                 approach for optimizing both accuracy and
                 interpretability of the classifiers. The proposed
                 algorithms are shaped according to the MapReduce
                 programming model and have been deployed on well-known
                 data processing frameworks, widely employed in research
                 as well as industrial contexts. The performance
                 evaluation has been carried out by using different big
                 data benchmarks and the results obtained by the
                 proposed approaches and by some state-of-the-art
                 distributed classification algorithms have been
                 extensively discussed in terms of accuracy, model
                 complexity, and computation time.",
  notes =        "is this GP?


Genetic Programming entries for Armando Segatori