Filtering Outliers in One Step with Genetic Programming

Created by W.Langdon from gp-bibliography.bib Revision:1.4504

  author =       "Uriel Lopez and Leonardo Trujillo and 
                 Pierrick Legrand",
  title =        "Filtering Outliers in One Step with Genetic
  booktitle =    "15th International Conference on Parallel Problem
                 Solving from Nature",
  year =         "2018",
  editor =       "Anne Auger and Carlos M. Fonseca and Nuno Lourenco and 
                 Penousal Machado and Luis Paquete and Darrell Whitley",
  volume =       "11101",
  series =       "LNCS",
  pages =        "209--222",
  address =      "Coimbra, Portugal",
  month =        "8-12 " # sep,
  publisher =    "Springer",
  keywords =     "genetic algorithms, genetic programming, Outliers,
                 Robust regression",
  isbn13 =       "978-3-319-99252-5",
  URL =          "",
  DOI =          "doi:10.1007/978-3-319-99253-2_17",
  abstract =     "Outliers are one of the most difficult issues when
                 dealing with real-world modelling tasks. Even a small
                 percentage of outliers can impede a learning
                 algorithm's ability to fit a dataset. While robust
                 regression algorithms exist, they fail when a dataset
                 is corrupted by more than 50percent of outliers
                 (breakdown point). In the case of Genetic Programming,
                 robust regression has not been properly studied. In
                 this paper we present a method that works as a filter,
                 removing outliers from the target variable (vertical
                 outliers). The algorithm is simple, it uses a randomly
                 generated population of GP trees to determine which
                 target values should be labelled as outliers. The
                 method is highly efficient. Results show that it can
                 return a clean dataset when contamination reaches as
                 high as 90%, and may be able to handle higher levels of
                 contamination. In this study only synthetic univariate
                 benchmarks are used to evaluate the approach, but it
                 must be stressed that no other approaches can deal with
                 such high levels of outlier contamination while
                 requiring such small computational effort.",
  notes =        "PPSN2018

                 This two-volume set LNCS 11101 and 11102 constitutes
                 the refereed proceedings of the 15th International
                 Conference on Parallel Problem Solving from Nature,
                 PPSN 2018",

Genetic Programming entries for Uriel Lopez Leonardo Trujillo Pierrick Legrand