Aggressive and Effective Feature Selection using Genetic Programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  title =        "Aggressive and Effective Feature Selection using
                 Genetic Programming",
  author =       "Isac Sandin and Guilherme Andrade and 
                 Felipe Viegas and Daniel Madeira and Leonardo Rocha and 
                 Thiago Salles and Marcos Andre Goncalves",
  pages =        "2718--2725",
  booktitle =    "Proceedings of the 2012 IEEE Congress on Evolutionary
  year =         "2012",
  editor =       "Xiaodong Li",
  month =        "10-15 " # jun,
  DOI =          "doi:10.1109/CEC.2012.6252878",
  address =      "Brisbane, Australia",
  ISBN =         "0-7803-8515-2",
  keywords =     "genetic algorithms, genetic programming, Data mining,
                 Learning classifier systems",
  abstract =     "One of the major challenges in automatic
                 classification is to deal with highly dimensional data.
                 Several dimensionality reduction strategies, including
                 popular feature selection metrics such as Information
                 Gain and Chi-squared, have already been proposed to
                 deal with this situation. However, these strategies are
                 not well suited when the data is very skewed, a common
                 situation in real-world data sets. This occurs when the
                 number of samples in one class is much larger than the
                 others, causing common feature selection metrics to be
                 biased towards the features observed in the largest
                 class. In this paper, we propose the use of Genetic
                 Programming (GP) to implement an aggressive, yet very
                 effective, selection of attributes. Our GP-based
                 strategy is able to largely reduce dimensionality,
                 while dealing effectively with skewed data. To this
                 end, we exploit some of the most common feature
                 selection metrics and, with GP, combine their results
                 into new sets of features, obtaining a better unbiased
                 estimate for the discriminative power of each feature.
                 Our proposal was evaluated against each individual
                 feature selection metric used in our GP-based solution
                 (namely, Information Gain, Chi-squared, Odds-Ratio,
                 Correlation Coefficient) using a k8 cancer-rescue
                 mutants data set, a very unbalanced collection
                 referring to examples of p53 protein. For this data
                 set, our solution not only increases the efficiency of
                 the learning algorithms, with an aggressive reduction
                 of the input space, but also significantly increases
                 its accuracy.",
  notes =        "WCCI 2012. CEC 2012 - A joint meeting of the IEEE, the
                 EPS and the IET.",

Genetic Programming entries for Isac Sandin Guilherme Andrade Felipe Viegas Daniel Madeira Leonardo Rocha Thiago Cunha de Moura Salles Marcos Andre Goncalves