Finding Relevant Attributes in High Dimensional Data: A Distributed Computing Hybrid Data Mining Strategy

Created by W.Langdon from gp-bibliography.bib Revision:1.4221

  author =       "Julio J. Valdes and Alan J. Barton",
  title =        "Finding Relevant Attributes in High Dimensional Data:
                 A Distributed Computing Hybrid Data Mining Strategy",
  year =         "2007",
  booktitle =    "Transactions on Rough Sets VI",
  publisher =    "Springer",
  volume =       "4374",
  series =       "Lecture Notes in Computer Science",
  keywords =     "genetic algorithms, genetic programming",
  pages =        "366--396",
  DOI =          "doi:10.1007/978-3-540-71200-8_20",
  bibsource =    "DBLP,",
  isbn13 =       "978-3-540-71198-8",
  abstract =     "In many domains the data objects are described in
                 terms of a large number of features (e.g. microarray
                 experiments, or spectral characterizations of organic
                 and inorganic samples). A pipelined approach using two
                 clustering algorithms in combination with Rough Sets is
                 investigated for the purpose of discovering important
                 combinations of attributes in high dimensional data.
                 The Leader and several k-means algorithms are used as
                 fast procedures for attribute set simplification of the
                 information systems presented to the rough sets
                 algorithms. The data described in terms of these fewer
                 features are then discretized with respect to the
                 decision attribute according to different rough set
                 based schemes. From them, the reducts and their derived
                 rules are extracted, which are applied to test data in
                 order to evaluate the resulting classification accuracy
                 in crossvalidation experiments. The data mining process
                 is implemented within a high throughput distributed
                 computing environment. Nonlinear transformation of
                 attribute subsets preserving the similarity structure
                 of the data were also investigated. Their
                 classification ability, and that of subsets of
                 attributes obtained after the mining process were
                 described in terms of analytic functions obtained by
                 genetic programming (gene expression programming), and
                 simplified using computer algebra systems. Visual data
                 mining techniques using virtual reality were used for
                 inspecting results. An exploration of this approach
                 (using Leukemia, Colon cancer and Breast cancer gene
                 expression data) was conducted in a series of
                 experiments. They led to small subsets of genes with
                 high discrimination power.",

Genetic Programming entries for Julio J Valdes Alan J Barton