Automatic Discovery Using Genetic Programming of an Unknown-Sized Detector of Protein Motifs Containing Repeatedly-used Subexpressions

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

@InProceedings{koza:1995:protien,
  author =       "John R. Koza and David Andre",
  title =        "Automatic Discovery Using Genetic Programming of an
                 Unknown-Sized Detector of Protein Motifs Containing
                 Repeatedly-used Subexpressions",
  booktitle =    "Proceedings of the Workshop on Genetic Programming:
                 From Theory to Real-World Applications",
  year =         "1995",
  editor =       "Justinian P. Rosca",
  pages =        "89--97",
  address =      "Tahoe City, California, USA",
  month =        "9 " # jul,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "http://www.genetic-programming.com/jkpdf/ml1995motif.pdf",
  size =         "9 pages",
  abstract =     "Automated methods of machine learning may be useful in
                 discovering biologically meaningful patterns that are
                 hidden in the rapidly growing databases of genomic and
                 protein sequences. However, almost all existing methods
                 of automated discovery require that the user specify,
                 in advance, the size and shape of the pattern that is
                 to be discovered. Moreover, existing methods do not
                 have a workable analog of the idea of a reusable
                 subroutine to exploit the recurring sub-patterns of a
                 problem environment. Genetic programming can evolve
                 complicated problem-solving expressions of unspecified
                 size and shape. When automatically defined functions
                 are added to genetic programming, genetic programming
                 becomes capable of efficiently capturing and exploiting
                 recurring sub-patterns. This paper describes how
                 genetic programming with automatically defined
                 functions successfully evolved motifs for detecting the
                 D-E-A-D box family of proteins and for detecting the
                 manganese superoxide dismutase family. Both motifs were
                 evolved without prespecifying their length. Both
                 evolved motifs employed automatically defined functions
                 to capture the repeated use of common subexpressions.
                 When tested against the SWISS-PROT database of
                 proteins, the two genetically evolved consensus motifs
                 detect the two families either as well, or slightly
                 better than, the comparable human-written motifs found
                 in the PROSITE database.",
  notes =        "GP successfully evolved code for detecting the D-E-A-D
                 box family of protiens which worked as well or better
                 than human written code",
  notes =        "part of \cite{rosca:1995:ml}",
}

Genetic Programming entries for John Koza David Andre

Citations