Bias and Variance Reduction Strategies for Improving Generalisation Performance of Genetic Programming on Binary Classification Tasks

Created by W.Langdon from gp-bibliography.bib Revision:1.4549

  author =       "Jeannie Fitzgerald",
  title =        "Bias and Variance Reduction Strategies for Improving
                 Generalisation Performance of Genetic Programming on
                 Binary Classification Tasks",
  school =       "University of Limerick",
  year =         "2014",
  address =      "Ireland",
  month =        may,
  email =        "",
  keywords =     "genetic algorithms, genetic programming,
                 generalisation, generalization, classification",
  URL =          "",
  size =         "347 pages",
  abstract =     "The central hypothesis of this thesis is that the
                 reduction of variance and inappropriate bias in GP will
                 lead to the evolution of more generalisable and robust
                 numerical binary classifiers. A secondary, supporting,
                 hypothesis is that dynamic, individualised approaches
                 may have a role to play in reducing the magnitude of
                 error due to bias and variance, as such approaches can
                 introduce diversity and change into the learning
                 system. We expect that, where an influencing parameter
                 is applied identically to each member of the
                 population, and remains unchanged throughout evolution,
                 that any (undesirable) effects on bias and variance
                 error are likely to be stronger than if individuals in
                 the population apply the same parameter differently,
                 and where the application of any such parameter can
                 change in response to system behaviour. In other words,
                 a monolithic system may suffer from monolithic bias,
                 and we believe that the introduction of individualised,
                 dynamic approaches may have a beneficial effect in
                 diluting this, leading to improved generalisation in
                 the GP learner. We explore the concepts of bias and
                 variance as components of generalisation error for
                 binary classification tasks, and investigate aspects of
                 the GP paradigm which may influence these error
                 components. Specifically, we identify sources of
                 variance, language bias, search bias and selection bias
                 inherent in standard GP for binary classification and
                 pose several core questions relating to these sources.
                 If the research can be shown to affirmatively answer
                 these core questions, then our hypotheses will have
                 been proved.

                 In responding to the core questions we carry out
                 several empirical studies with the objective of gaining
                 a deeper understanding of the impacts of these sources
                 of bias and variance on generalisation and we propose
                 several novel approaches which may be used to reduce
                 variance, or to replace inappropriate inductive biases
                 with more appropriate ones, with a view to improving
                 generalisation performance.

                 Ultimately we combine several techniques, developed to
                 address our fundamental questions, into a single,
                 optimised GP (OGP) configuration. This is evaluated on
                 nine different binary classification tasks and compared
                 with the performance of several well known and
                 respected machine learning algorithms on the same
                 datasets. Results of these experiments demonstrate that
                 a GP learner which has been optimised to reduce
                 variance and bias error through individualised, dynamic
                 and population based adaptations can deliver
                 classification performance which is competitive with
                 other machine learning algorithms.

                 The empirical studies and proposed techniques described
                 in this theses provide answers to the core questions
                 which we believe validate our central and supporting
  notes =        "Supervisor: Prof. Conor Ryan External Examiner: Dr.
                 Anna I. Esparcia-Alcazar Internal Examiner: Dr. Michael

Genetic Programming entries for Jeannie Fitzgerald