Prediction of Fault-Prone Software Modules using Statistical and Machine Learning Methods

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Yogesh Singh and Arvinder Kaur and Ruchika Malhotra",
  title =        "Prediction of Fault-Prone Software Modules using
                 Statistical and Machine Learning Methods",
  journal =      "International Journal of Computer Applications",
  year =         "2010",
  volume =       "1",
  number =       "22",
  pages =        "6--13",
  month =        feb,
  publisher =    "Foundation of Computer Science",
  keywords =     "genetic algorithms, genetic programming, gene
                 expression programming",
  ISSN =         "09758887",
  bibsource =    "OAI-PMH server at",
  oai =          "oai:doaj-articles:ecff8a1732b1821d2f68a0c12737032c",
  URL =          "",
  URL =          "",
  DOI =          "doi:10.5120/525-685",
  abstract =     "Demand for producing quality software has rapidly
                 increased during the last few years. This is leading to
                 increase in development of machine learning methods for
                 exploring data sets, which can be used in constructing
                 models for predicting quality attributes such as fault
                 proneness, maintenance effort, testing effort,
                 productivity and reliability. This paper examines and
                 compares logistic regression and six machine learning
                 methods (Artificial neural network, decision tree,
                 support vector machine, cascade correlation network,
                 group method of data handling polynomial method, gene
                 expression programming). These methods are explored
                 empirically to find the effect of static code metrics
                 on the fault proneness of software modules. We use
                 publicly available data set AR1 to analyse and compare
                 the regression and machine learning methods in this
                 study. The performance of the methods is compared by
                 computing the area under the curve using Receiver
                 Operating Characteristic (ROC) analysis. The results
                 show that the area under the curve (measured from the
                 ROC analysis) of model predicted using decision tree
                 modelling is 0.865 and is a better model than the model
                 predicted using regression and other machine learning
                 methods. The study shows that the machine learning
                 methods are useful in constructing software quality

Genetic Programming entries for Yogesh Singh Arvinder Kaur Ruchika Malhotra