Automatic discovery of protein motifs using genetic programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4221

  author =       "John R. Koza and David Andre",
  title =        "Automatic discovery of protein motifs using genetic
  booktitle =    "Evolutionary Computation: Theory and Applications",
  publisher =    "World Scientific",
  year =         "1999",
  editor =       "Xin Yao",
  chapter =      "5",
  pages =        "171--197",
  address =      "Singapore",
  keywords =     "genetic algorithms, genetic programming, DEAD box,
                 SWISSPROT, PROSITE",
  ISBN =         "981-02-2306-4",
  URL =          "",
  abstract =     "Automated methods of machine learning may prove to be
                 useful in discovering biologically meaningful
                 information hidden in the rapidly growing databases of
                 DNA sequences and protein sequences. Genetic
                 programming is an extension of the genetic algorithm in
                 which a population of computer programs is bred, over a
                 series of generations, in order to solve a problem.
                 Genetic programming is capable of evolving complicated
                 problem-solving expressions of unspecified size and
                 shape. Moreover, when automatically defined functions
                 are added to genetic programming, genetic programming
                 becomes capable of efficiently capturing and exploiting
                 recurring sub-patterns. This chapter describes how
                 genetic programming with automatically defined
                 functions successfully evolved motifs for detecting the
                 D-E-A-D box family of proteins and for detecting the
                 manganese superoxide dismutase family. Both motifs were
                 evolved without prespecifying their length. Both
                 evolved motifs employed automatically defined functions
                 to capture the repeated use of common subexpressions.
                 When tested against the SWISS-PROT database of
                 proteins, the two genetically evolved consensus motifs
                 detect the two families either as well, or slightly
                 better than, the comparable human-written motifs found
                 in the PROSITE database.",
  notes =        "ECTA, two ADFs each has OR in function set (ie
                 combination of 2 alternative amino acids at this
                 point). Result producing branch has AND (ie two
                 adjacent (along backbone) amino-acids (or sets of
                 aacids)). Covariance fitness, formula in terms of
                 number true positives etc.

                 Jury method: 12 motives evolved by separate GP runs
                 combined into one by requiring unanimous jury decision.
                 (Combined by hand or automatically?)

                 Parallel GP system, 64 transputer nodes.",

Genetic Programming entries for John Koza David Andre