Towards Benchmarking Feature Subset Selection Methods for Software Fault Prediction

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Wasif Afzal and Richard Torkar",
  title =        "Towards Benchmarking Feature Subset Selection Methods
                 for Software Fault Prediction",
  booktitle =    "Computational Intelligence and Quantitative Software
  publisher =    "Springer",
  year =         "2016",
  editor =       "Witold Pedrycz and Giancarlo Succi and 
                 Alberto Sillitti",
  volume =       "617",
  series =       "Studies in Computational Intelligence",
  chapter =      "3",
  pages =        "33--58",
  keywords =     "genetic algorithms, genetic programming, SBSE, Feature
                 subset selection, Fault prediction, Empirical",
  isbn13 =       "978-3-319-25964-2",
  DOI =          "doi:10.1007/978-3-319-25964-2_3",
  abstract =     "Despite the general acceptance that software
                 engineering datasets often contain noisy, irrelevant or
                 redundant variables, very few benchmark studies of
                 feature subset selection (FSS) methods on real-life
                 data from software projects have been conducted. This
                 paper provides an empirical comparison of
                 state-of-the-art FSS methods: information gain
                 attribute ranking (IG); Relief (RLF); principal
                 component analysis (PCA); correlation-based feature
                 selection (CFS); consistency-based subset evaluation
                 (CNS); wrapper subset evaluation (WRP); and an
                 evolutionary computation method, genetic programming
                 (GP), on five fault prediction datasets from the
                 PROMISE data repository. For all the datasets, the area
                 under the receiver operating characteristic curve—the
                 AUC value averaged over 10-fold cross-validation
                 runs—was calculated for each FSS method-dataset
                 combination before and after FSS. Two diverse learning
                 algorithms, C4.5 and naive Bayes (NB) are used to test
                 the attribute sets given by each FSS method. The
                 results show that although there are no statistically
                 significant differences between the AUC values for the
                 different FSS methods for both C4.5 and NB, a smaller
                 set of FSS methods (IG, RLF, GP) consistently select
                 fewer attributes without degrading classification
                 accuracy. We conclude that in general, FSS is
                 beneficial as it helps improve classification accuracy
                 of NB and C4.5. There is no single best FSS method for
                 all datasets but IG, RLF and GP consistently select
                 fewer attributes without degrading classification
                 accuracy within statistically significant boundaries.",

Genetic Programming entries for Wasif Afzal Richard Torkar