Automatic Feature Extraction for Pattern Recognition

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

@PhdThesis{sherrah:1998:thesis,
  author =       "Jamie Sherrah",
  title =        "Automatic Feature Extraction for Pattern Recognition",
  school =       "University of Adelaide, South Australia",
  year =         "1998",
  month =        jul,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "http://hdl.handle.net/2440/19264",
  size =         "261 pages",
  abstract =     "A typical pattern recognition system consists of two
                 stages: the pre-processing stage to extract features
                 from the data, and the classification stage to assign
                 the feature vector to one of several classes. While
                 many general classifiers exist and are well-understood,
                 the pre-processing stage is usually ad-hoc and designed
                 by hand. Although the accuracy of the classifier is
                 heavily dependent on the choice of features, there is
                 little more guidance in the process of manual feature
                 extraction than intuition, experience and
                 trial-and-error.

                 To achieve automatic and near-optimal pre-processor
                 design, a framework is required for the
                 problem-independent extraction of features. Within such
                 a framework, the concept of an optimal pre-processor
                 can be formulated. The framework must allow
                 pre-processors which are universally applicable and
                 realisable using finite resources. Those frameworks
                 already in existence, such as principal-component
                 analysis and multi-layer perceptrons, are either unable
                 to cope with arbitrary non-linearity or unable to be
                 implemented using finite resources because they employ
                 one type of constituent function and have a fixed
                 structure.

                 In this thesis, a framework for automatic feature
                 extraction is proposed, called the {"}generalised
                 pre-processor{"}. This is an arbitrarily-interconnected
                 feed-forward network with arbitrary non-linear
                 functions at the nodes. The use of different
                 constituent functions and irregular inter-connection
                 strategies allows for the economic realisation of a
                 pre-processor in more situations than the more uniform
                 universal approximators, such as the multi-layer
                 perceptron. A software system called the
                 {"}Evolutionary Pre-Processor{"} is presented which
                 performs a search over the space of generalised
                 pre-processors. The system is used for supervised
                 classification, and must be provided with a data set of
                 measurement vectors and associated class labels. Based
                 on genetic programming, the evolutionary pre-processor
                 begins with a population of randomly-generated
                 pre-processors. The fitness of each pre-processor is
                 based on the estimated misclassification cost of a
                 classifier trained on the pre-processed data. Through
                 fitness-proportionate reproduction and recombination,
                 the ability of the pre-processors to separate the data
                 increases with generations.

                 The evolutionary pre-processor has been tested on 15
                 real and synthetic public-domain data sets. Neural
                 networks, decision trees and five simple statistical
                 classification techniques were applied to the same
                 problems, and the results compared. The results show
                 that the evolutionary pre-processor maintains good
                 classification and generalisation performance, and is
                 more accurate on average than the decision tree method.
                 The neural network achieved the lowest classification
                 errors on average, but was surpassed by the
                 evolutionary pre-processor on some synthetic problems.
                 Both the evolutionary pre-processor and the decision
                 tree produce solutions which can be understood and
                 interpreted by the user. These results must be
                 considered with care, however, as they fluctuate with
                 different random seeds and partitioning of the data.
                 The investigations of this thesis have revealed that a
                 search over pre-processors is feasible. The synthesis
                 of pre-processors from a variety of non-linear, and
                 even discontinuous functions occasionally provides
                 better discrimination than existing methods of
                 classification, but for most problems gradient-descent
                 methods are adequate. The evolutionary pre-processor
                 has advantages for knowledge discovery due to the
                 versatility with which appropriate functions can be
                 combined, but is limited due to the high variability in
                 results. It should be used in conjunction with other
                 methods of knowledge discovery for reliable results.
                 The evolved pre-processors and simple classifiers used
                 by EPrep result in relatively accurate classification
                 systems that can be implemented more economically than
                 other methods.",
}

Genetic Programming entries for Jamie R Sherrah

Citations