Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training data

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Ivo Goncalves and Sara Silva",
  title =        "Balancing Learning and Overfitting in Genetic
                 Programming with Interleaved Sampling of Training
  booktitle =    "Proceedings of the 16th European Conference on Genetic
                 Programming, EuroGP 2013",
  year =         "2013",
  month =        "3-5 " # apr,
  editor =       "Krzysztof Krawiec and Alberto Moraglio and Ting Hu and 
                 A. Sima Uyar and Bin Hu",
  series =       "LNCS",
  volume =       "7831",
  publisher =    "Springer Verlag",
  address =      "Vienna, Austria",
  pages =        "73--84",
  organisation = "EvoStar",
  keywords =     "genetic algorithms, genetic programming, Overfitting,
                 Generalisation, Pharmacokinetics, Drug Discovery",
  isbn13 =       "978-3-642-37206-3",
  DOI =          "doi:10.1007/978-3-642-37207-0_7",
  abstract =     "Generalisation is the ability of a model to perform
                 well on cases not seen during the training phase. In
                 Genetic Programming generalization has recently been
                 recognised as an important open issue, and increased
                 efforts are being made towards evolving models that do
                 not overfit. In this work we expand on recent
                 developments that showed that using a small and
                 frequently changing subset of the training data is
                 effective in reducing over fitting and improving
                 generalisation. Particularly, we build upon the idea of
                 randomly choosing a single training instance at each
                 generation and balance it with periodically using all
                 training data. The motivation for this approach is
                 based on trying to keep overfitting low (represented by
                 using a single training instance) and still presenting
                 enough information so that a general pattern can be
                 found (represented by using all training data). We
                 propose two approaches called interleaved sampling and
                 random interleaved sampling that respectively represent
                 doing this balancing in a deterministic or a
                 probabilistic way. Experiments are conducted on three
                 high-dimensional real-life datasets on the
                 pharmacokinetics domain. Results show that most of the
                 variants of the proposed approaches are able to
                 consistently improve generalisation and reduce over
                 fitting when compared to standard Genetic Programming.
                 The best variants are even able of such improvements on
                 a dataset where a recent and representative
                 state-of-the-art method could not. Furthermore, the
                 resulting models are short and hence easier to
                 interpret, an important achievement from the
                 applications' point of view.",
  notes =        "Part of \cite{Krawiec:2013:GP} EuroGP'2013 held in
                 conjunction with EvoCOP2013, EvoBIO2013, EvoMusArt2013
                 and EvoApplications2013",

Genetic Programming entries for Ivo Goncalves Sara Silva