Motif kernel generated by genetic programming improves remote homology and fold detection

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

@Article{oai:biomedcentral.com:1471-2105-8-23,
  title =        "Motif kernel generated by genetic programming improves
                 remote homology and fold detection",
  author =       "Tony Handstad and Arne J H Hestnes and Pal Saetrom",
  journal =      "BMC Bioinformatics",
  year =         "2007",
  volume =       "8",
  number =       "23",
  month =        jan # "~25",
  publisher =    "BioMed Central Ltd.",
  bibsource =    "OAI-PMH server at www.biomedcentral.com",
  language =     "en",
  oai =          "oai:biomedcentral.com:1471-2105-8-23",
  rights =       "Copyright 2007 H{\aa}ndstad et al; licensee BioMed
                 Central Ltd.",
  keywords =     "genetic algorithms, genetic programming, GPkernel,
                 SVM, MISD, boosting",
  ISSN =         "1471-2105",
  URL =          "http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.276.5386",
  URL =          "http://www.biomedcentral.com/content/pdf/1471-2105-8-23.pdf",
  URL =          "http://www.biomedcentral.com/1471-2105/8/23",
  DOI =          "doi:10.1186/1471-2105-8-23",
  url_undergraduate_thesis = "http://www.diva-portal.org/diva/getDocument?urn_nbn_no_ntnu_diva-1030-1__fulltext.pdf",
  size =         "16 pages",
  abstract =     "Background

                 Protein remote homology detection is a central problem
                 in computational biology. Most recent methods train
                 support vector machines to discriminate between related
                 and unrelated sequences and these studies have
                 introduced several types of kernels. One successful
                 approach is to base a kernel on shared occurrences of
                 discrete sequence motifs. Still, many protein sequences
                 fail to be classified correctly for a lack of a
                 suitable set of motifs for these sequences. Results

                 We introduce the GPkernel, which is a motif kernel
                 based on discrete sequence motifs where the motifs are
                 evolved using genetic programming. All proteins can be
                 grouped according to evolutionary relations and
                 structure, and the method uses this inherent structure
                 to create groups of motifs that discriminate between
                 different families of evolutionary origin. When tested
                 on two SCOP benchmarks, the superfamily and fold
                 recognition problems, the GPkernel gives significantly
                 better results compared to related methods of remote
                 homology detection. Conclusion

                 The GPkernel gives particularly good results on the
                 more difficult fold recognition problem compared to the
                 other methods. This is mainly because the method
                 creates motif sets that describe similarities among
                 subgroups of both the related and unrelated proteins.
                 This rich set of motifs give a better description of
                 the similarities and differences between different
                 folds than do previous motif-based methods.",
  notes =        "PMID: 1794419

                 Undergraduate thesis: Protein Remote Homology Detection
                 using Motifs made with Genetic Programming Handstad,
                 Tony.
                 http://urn.ub.uu.se/resolve?urn=urn:nbn:no:ntnu:diva-1030
                 (2007-03-30) 118 pages.

                 Binary feature vectors. 2 seconds run time (PC+search
                 chip). GPboost, SCOP, eMOTIF kernel, ROC, classifier
                 combination. 'GPkernel performs significantly better
                 than the other motif-based methods' p5. GPextended.
                 evolves regular expressions. 'In addition to [the 20]
                 amino acid characters, the motifs are also made from
                 the disjunction operator (|) wildcard (.) and Hamming
                 distance {:p>=x} that specifies the minimum number of
                 characters that must match in the pattern.' p13.",
}

Genetic Programming entries for Tony Handstad Arne Johan H Hestnes Pal Saetrom

Citations