Genetic Programming and Domain Knowledge in Hierarchical Multi-Classification

Created by W.Langdon from gp-bibliography.bib Revision:1.4208

  author =       "Richard Llewellyn Smith",
  title =        "Genetic Programming and Domain Knowledge in
                 Hierarchical Multi-Classification",
  school =       "University of Wales, Aberystwyth",
  year =         "2006",
  month =        mar,
  email =        "",
  keywords =     "genetic algorithms, genetic programming,
                 backpropagation, feedback, hierarchical, multiclass,
                 multilabel, classification",
  URL =          "",
  URL =          "",
  size =         "294 pages",
  abstract =     "This thesis describes an exploration of the
                 application of Genetic Programming to classification
                 problems in the domain of functional genomics. In the
                 course of the investigation a GP system was developed
                 which includes several novel features designed to
                 specifically target the features of the problem domain.
                 These features, which are described and investigated,
                 include: a graph-based program representation scheme; a
                 packaging system allowing for the construction of
                 programs in an arbitrary number of layers, designed to
                 facilitate multiple outputs; a gene pool communal to
                 the GP population, which, in conjunction with the
                 layering feature, increases the amount of of effective
                 code present in the population; a generic ``genetic
                 engineering'' process for imposing constraints on the
                 programs of the system; and a backpropagation-inspired
                 feedback mechanism which increases the speed at which
                 the system learns.

                 These features were then tested on data sets exhibiting
                 the properties of genomic data sets -- in particular,
                 multiple non-disjoint classes, and hierarchical
                 classification schemes. These features are then
                 developed, along with fitness measures and the
                 evolutionary process to best model data sets of this
                 type. An exploratory attempt to produce classifications
                 in conjunction with a graph-based classification scheme
                 is also performed.

                 The developed system is then applied to classifying
                 hard genomic data sets, and the thesis ultimately makes
                 some predictions concerning previously unclassified

Genetic Programming entries for Rich Smith