Symbolic Regression via GP as a Discovery Engine: Insights on Outliers and Prototypes

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Mark E. Kotanchek and Ekaterina Y. Vladislavleva and 
                 Guido F. Smits",
  title =        "Symbolic Regression via GP as a Discovery Engine:
                 Insights on Outliers and Prototypes",
  booktitle =    "Genetic Programming Theory and Practice {VII}",
  year =         "2009",
  editor =       "Rick L. Riolo and Una-May O'Reilly and 
                 Trent McConaghy",
  series =       "Genetic and Evolutionary Computation",
  address =      "Ann Arbor",
  month =        "14-16 " # may,
  publisher =    "Springer",
  chapter =      "4",
  pages =        "55--72",
  keywords =     "genetic algorithms, genetic programming, symbolic
                 regression, data modeling, system identification,
                 research assistant, discovery engine, outlier
                 detection, outliers, prototypes, data balancing",
  isbn13 =       "978-1-4419-1653-2",
  DOI =          "doi:10.1007/978-1-4419-1626-6_4",
  abstract =     "In this chapter we illustrate a framework based on
                 symbolic regression to generate and sharpen the
                 questions about the nature of the data-generating
                 system and provide additional context and understanding
                 based on the multi-variate numeric data. We emphasize
                 the necessity to perform data modeling in a global
                 approach iteratively applying data analysis &
                 adaptation, model building, and problem reduction
                 procedures. We demonstrate it for the problem of
                 detecting outliers and extracting significant features
                 from the CountryData
                 -- a data set of economic, political, social and
                 geographic data collected. We present two complementary
                 ways of extracting outliers from the data -the
                 content-based and the model-based approach. The
                 content-based approach studies the geometrical
                 structure of the multi-variate data, and uses
                 data-balancing algorithms to sort the data records in
                 the order of decreasing typicalness, and identifies the
                 outliers as the least typical records before the
                 modeling is applied to a data set. The model-based
                 outlier detection approach uses symbolic regression via
                 Pareto genetic programming to identify records which
                 are systematically under- or over-predicted by diverse
                 ensembles of (thousands of) global non-linear symbolic
                 regression models.

                 Both approaches applied to the CountryData produce
                 insights into outlier vs. prototypes division among
                 world countries and about driving economic properties
                 predicting gross domestic product (GDP) per capita.",
  notes =        "part of \cite{Riolo:2009:GPTP}",

Genetic Programming entries for Mark Kotanchek Ekaterina (Katya) Vladislavleva Guido F Smits