Feature Selection and Classification of High Dimensional Mass Spectrometry Data: A Genetic Programming Approach

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Soha Ahmed and Mengjie Zhang and Lifeng Peng",
  title =        "Feature Selection and Classification of High
                 Dimensional Mass Spectrometry Data: A Genetic
                 Programming Approach",
  booktitle =    "11th European Conference on Evolutionary Computation,
                 Machine Learning and Data Mining in Bioinformatics,
                 {EvoBIO 2013}",
  year =         "2013",
  editor =       "Leonardo Vanneschi and William S. Bush and 
                 Mario Giacobini",
  month =        apr # " 3-5",
  series =       "LNCS",
  volume =       "7833",
  publisher =    "Springer Verlag",
  organisation = "EvoStar",
  address =      "Vienna, Austria",
  pages =        "43--55",
  keywords =     "genetic algorithms, genetic programming",
  isbn13 =       "978-3-642-37188-2",
  DOI =          "doi:10.1007/978-3-642-37189-9_5",
  abstract =     "Biomarker discovery using mass spectrometry (MS) data
                 is very useful in disease detection and drug discovery.
                 The process of biomarker discovery in MS data must
                 start with feature selection as the number of features
                 in MS data is extremely large (e.g. thousands) while
                 the number of samples is comparatively small. In this
                 study, we propose the use of genetic programming (GP)
                 for automatic feature selection and classification of
                 MS data. This GP based approach works by using the
                 features selected by two feature selection metrics,
                 namely information gain (IG) and relief-f (REFS-F) in
                 the terminal set. The feature selection performance of
                 the proposed approach is examined and compared with IG
                 and REFS-F alone on five MS data sets with different
                 numbers of features and instances. Naive Bayes (NB),
                 support vector machines (SVMs) and J48 decision trees
                 (J48) are used in the experiments to evaluate the
                 classification accuracy of the selected features.
                 Meanwhile, GP is also used as a classification method
                 in the experiments and its performance is compared with
                 that of NB, SVMs and J48. The results show that GP as a
                 feature selection method can select a smaller number of
                 features with better classification performance than IG
                 and REFS-F using NB, SVMs and J48. In addition, GP as a
                 classification method also outperforms NB and J48 and
                 achieves comparable or slightly better performance than
                 SVMs on these data sets.",

Genetic Programming entries for Soha Ahmed Mengjie Zhang Lifeng Peng