Parallel Hybrid Clustering using Genetic Programming and Multi-Objective Fitness with Density (PYRAMID)

Created by W.Langdon from gp-bibliography.bib Revision:1.4340

  author =       "Samir R. Tout",
  title =        "Parallel Hybrid Clustering using Genetic Programming
                 and Multi-Objective Fitness with Density {(PYRAMID)}",
  school =       "Computer and Information Sciences, Nova Southeastern
  year =         "2006",
  month =        may # " 3",
  keywords =     "genetic algorithms, genetic programming",
  ISBN =         "0-542-56083-6",
  URL =          "",
  URL =          "",
  URL =          "",
  size =         "295 pages",
  abstract =     "Clustering is the art of locating patterns in large
                 data sets. It is an active research area that provides
                 value to scientific as well as business applications.
                 There are some challenges that face practical
                 clustering including: identifying clusters of arbitrary
                 shapes, sensitivity to the order of input, dynamic
                 determination of the number of clusters, outlier
                 handling, high dependency on user-defined parameters,
                 processing speed of massive data sets, and the
                 potential to fall into sub-optimal solutions.

                 Many studies that were conducted in the realm of
                 clustering have addressed some of these challenges.
                 This study proposes a new approach, called parallel
                 hybrid clustering using genetic programming and
                 multi-objective fitness with density (PYRAMID), that
                 tackles several of these challenges from a different

                 PYRAMID employs genetic programming to represent
                 arbitrary cluster shapes and circumvent falling in
                 local optima. It accommodates large data sets and
                 avoids dependency on the order of input by quantizing
                 the data space, i.e., the space on which the data set
                 resides, thus abstracting it into hyper-rectangular
                 cells and creating genetic programming individuals as
                 concatenations of these cells. Thus the cells become
                 the subject of clustering, rather than the data points
                 themselves. PYRAMID also uses a density-based
                 multi-objective fitness function to handle outliers. It
                 gathers statistics in a pre-processing step and uses
                 them so not to rely on user-defined parameters.
                 Finally, PYRAMID employs data parallelism in a
                 master-slave model in an attempt to cure the inherent
                 slow performance of evolutionary algorithms and provide
                 speedup. A master processor distributes the clustering
                 data evenly onto multiple slave processors. The slave
                 processors conduct the clustering on their local data
                 sets and report their clustering results back to the
                 master, which consolidates them by merging the partial
                 results into a final clustering solution. This last
                 step also involves determining the number of clusters
                 dynamically and labeling them accordingly.

                 Experiments have demonstrated that, using these
                 features, PYRAMID offers an advantage over some of the
                 existing approaches by tackling the clustering
                 challenges from a different angle.",
  notes =        "Supervisor: Junping Sun

                 UMI Microform 3208008",

Genetic Programming entries for Samir Tout