A Symbolic Regression Based Scoring System Improving Peptide Identifications for MS Amanda

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Viktoria Dorfer and Sergey Maltsev and 
                 Stephan Dreiseitl and Karl Mechtler and Stephan M. Winkler",
  title =        "A Symbolic Regression Based Scoring System Improving
                 Peptide Identifications for MS Amanda",
  booktitle =    "GECCO 2015 Medical Applications of Genetic and
                 Evolutionary Computation (MedGEC'15) Workshop",
  year =         "2015",
  editor =       "Stephen L. Smith and Stefano Cagnoni and 
                 Robert M. Patton",
  isbn13 =       "978-1-4503-3488-4",
  keywords =     "genetic algorithms, genetic programming",
  pages =        "1335--1341",
  month =        "11-15 " # jul,
  organisation = "SIGEVO",
  address =      "Madrid, Spain",
  URL =          "http://doi.acm.org/10.1145/2739482.2768509",
  DOI =          "doi:10.1145/2739482.2768509",
  publisher =    "ACM",
  publisher_address = "New York, NY, USA",
  abstract =     "Peptide search engines are algorithms that are able to
                 identify peptides (i.e., short proteins or parts of
                 proteins) from mass spectra of biological samples.
                 These identification algorithms report the best
                 matching peptide for a given spectrum and a score that
                 represents the quality of the match; usually, the
                 higher this score, the higher is the reliability of the
                 respective match. In order to estimate the specificity
                 and sensitivity of search engines, sets of target
                 sequences are given to the identification algorithm as
                 well as so-called decoy sequences that are randomly
                 created or scrambled versions of real sequences; decoy
                 sequences should be assigned low scores whereas target
                 sequences should be assigned high scores.

                 In this paper we present an approach based on symbolic
                 regression (using genetic programming) that helps to
                 distinguish between target and decoy matches. On the
                 basis of features calculated for matched sequences and
                 using the information on the original sequence set
                 (target or decoy) we learn mathematical models that
                 calculate updated scores. As an alternative to this
                 white box modelling approach we also use a black box
                 modelling method, namely random forests.

                 As we show in the empirical section of this paper, this
                 approach leads to scores that increase the number of
                 reliably identified samples that are originally scored
                 using the MS Amanda identification algorithm for high
                 resolution as well as for low resolution mass
  notes =        "Also known as \cite{2768509} Distributed at

Genetic Programming entries for Viktoria Dorfer Sergey Maltsev Stephan Dreiseitl Karl Mechtler Stephan M Winkler