Prediction of cancer class with majority voting genetic programming classifier using gene expression data

Created by W.Langdon from gp-bibliography.bib Revision:1.3973

@Article{Paul:2007:TCBB,
  author =       "Topon Kumar Paul and Hitoshi Iba",
  title =        "Prediction of cancer class with majority voting
                 genetic programming classifier using gene expression
                 data",
  journal =      "IEEE/ACM Transactions on Computational Biology and
                 Bioinformatics",
  year =         "2009",
  month =        apr # "-" # jun,
  volume =       "6",
  number =       "2",
  pages =        "353--367",
  keywords =     "genetic algorithms, genetic programming, Classifier
                 design and evaluation, Data mining, Feature extraction
                 or construction, Evolutionary computing, AdaBoost.M1,
                 kNN, SVM, RPMBGA, EGPC Java, MVGPC",
  ISSN =         "1545-5963",
  DOI =          "doi:10.1109/TCBB.2007.70245",
  size =         "14 pages",
  abstract =     "In order to get a better understanding of different
                 types of cancers and to find the possible biomarkers
                 for diseases, recently many researchers are analysing
                 the gene expression data using various machine learning
                 techniques. However, due to smaller number of training
                 samples compared to huge number of genes and class
                 imbalance, most of these methods suffer from
                 over-fitting. In this article, we present a majority
                 voting genetic programming classifier (MVGPC) for
                 classification of microarray data. Instead of a single
                 rule or a single set of rules, we evolve multiple rules
                 with genetic programming and then apply those rules to
                 test samples to determine their labels with majority
                 voting technique. By performing experiments on four
                 different public cancer data sets, including multiclass
                 data sets, we have found that the test accuracies of
                 MVGPC are better than those of other methods including
                 AdaBoost with genetic programming. Moreover, some of
                 the more frequently occurring genes in the
                 classification rules are known to be associated with
                 the types of cancers being studied in this article.",
  notes =        "4 genechip datasets (brain cancer prostate cancer,
                 breast cancer, lung carcinoma) Small sample size 50,
                 102, 22, 203. Preprocessing reduces to 4434, 5966,
                 3226, 3312 genes. affymetrix software (gene present(P),
                 missing(M), or A (unknown?) pop=4000, max rule
                 size=100, elitism. GP classifiers combined externally
                 by fixed rule (ie majority voting). MVGPC
                 multiclass=multiple one vs rest. point mutation,
                 overfitting, log transforms. {"}it may not matter
                 whether the data are normalised or not.{"} No differece
                 (at 5percent) between scaled and non-scaled.

                 Also known as \cite{10.1109/TCBB.2007.70245}
                 \cite{4359894}",
}

Genetic Programming entries for Topon Kumar Paul Hitoshi Iba

Citations