Genetic Programming for Biomarker Detection in Mass Spectrometry Data

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Soha Ahmed and Mengjie Zhang and Lifeng Peng",
  title =        "Genetic Programming for Biomarker Detection in Mass
                 Spectrometry Data",
  booktitle =    "25th Joint Conference Australasian Conference on
                 Artificial Intelligence, AI 2012",
  year =         "2012",
  editor =       "Michael Thielscher and Dongmo Zhang",
  volume =       "7691",
  series =       "Lecture Notes in Computer Science",
  pages =        "266--278",
  address =      "Sydney, Australia",
  month =        dec # " 4-7",
  publisher =    "Springer",
  keywords =     "genetic algorithms, genetic programming",
  isbn13 =       "978-3-642-35100-6",
  DOI =          "doi:10.1007/978-3-642-35101-3_23",
  abstract =     "Classification of mass spectrometry (MS) data is an
                 essential step for biomarker detection which can help
                 in diagnosis and prognosis of diseases. However, due to
                 the high dimensionality and the small sample size,
                 classification of MS data is very challenging. The
                 process of biomarker detection can be referred to as
                 feature selection and classification in terms of
                 machine learning. Genetic programming (GP) has been
                 widely used for classification and feature selection,
                 but it has not been effectively applied to biomarker
                 detection in the MS data. In this study we develop a GP
                 based approach to feature selection, feature extraction
                 and classification of mass spectrometry data for
                 biomarker detection. In this approach, we firstly use
                 GP to reduce the redundant features by selecting a
                 small number of important features and constructing
                 high-level features, then we use GP to classify the
                 data based on selected features and constructed
                 features. This approach is examined and compared with
                 three well known machine learning methods namely
                 decision trees, naive Bayes and support vector machines
                 on two biomarker detection data sets. The results show
                 that the proposed GP method can effectively select a
                 small number of important features from thousands of
                 original features for these problems, the constructed
                 high-level features can further improve the
                 classification performance, and the GP method
                 outperforms the three existing methods, namely naive
                 Bayes, SVMs and J48, on these problems.",

Genetic Programming entries for Soha Ahmed Mengjie Zhang Lifeng Peng