A hybrid under-sampling approach for mining unbalanced datasets: applications to banking and insurance

Created by W.Langdon from gp-bibliography.bib Revision:1.4208

  title =        "A hybrid under-sampling approach for mining unbalanced
                 datasets: applications to banking and insurance",
  author =       "Madireddi Vasu and Vadlamani Ravi",
  publisher =    "Inderscience Publishers",
  year =         "2011",
  month =        mar # "~03",
  volume =       "3",
  keywords =     "genetic algorithms, genetic programming, insurance
                 fraud detection, credit card churn prediction, data
                 mining; unbalanced datasets, machine learning, banking,
                 classifiers, classifier performance, k-means
                 clustering, support vector machines, SVM, logistic
                 regression, multilayer perceptron, radial basis
                 function networks, RBF neural networks, GMDH, decision
  ISSN =         "1759-1171",
  bibsource =    "OAI-PMH server at www.inderscience.com",
  journal =      "Int. J. of Data Mining and Modelling and Management",
  issue =        "1",
  language =     "eng",
  pages =        "75--105",
  relation =     "ISSN online: 1759-1171 ISSN print: 1759-1163",
  rights =       "Inderscience Copyright",
  source =       "IJDMMM (2011), Vol 3 Issue 1, pp 75 - 105",
  URL =          "http://www.inderscience.com/link.php?id=38812",
  DOI =          "doi:10.1504/IJDMMM.2011.038812",
  abstract =     "In solving unbalanced classification problems, machine
                 learning algorithms are overwhelmed by the majority
                 class and consequently misclassify the minority class
                 observations. Here, we propose a hybrid under-sampling
                 approach to improve the performance of classifiers. The
                 proposed approach first employs k-reverse nearest
                 neighbour (kRNN) method to detect the outliers from
                 majority class. After removing the outliers, using
                 K-means clustering, K-clusters are selected to further
                 reduce the influence of the majority class. Then, we
                 employed support vector machine (SVM), logistic
                 regression (LR), multi layer perceptron (MLP), radial
                 basis function network (RBF), group method of data
                 handling (GMDH), genetic programming (GP) and decision
                 tree (J48) for classification purpose. The
                 effectiveness of the proposed approach was demonstrated
                 on datasets taken from insurance fraud detection and
                 credit card churn in banking domain. Ten-fold cross
                 validation method was used in the study. It is observed
                 that the proposed approach improved the performance of
                 the classifiers.",

Genetic Programming entries for Madireddi Vasu Vadlamani Ravi