Created by W.Langdon from gp-bibliography.bib Revision:1.4020

@PhdThesis{vladislavleva:2008:thesis, author = "Ekaterina Vladislavleva", title = "Model-based Problem Solving through Symbolic Regression via Pareto Genetic Programming", school = "Tilburg University", year = "2008", address = "Tilburg, the Netherlands", month = aug, isbn13 = "978 90 5668 217 0", keywords = "genetic algorithms, genetic programming, symbolic regression", URL = "http://arno.uvt.nl/show.cgi?fid=80764", broken = "http://center.uvt.nl/gs/thesis/vladislavleva.html", size = "288 pages", abstract = "The main focus of this dissertation is identification of relationships from given input-output data by means of symbolic regression. The challenging task of symbolic regression is to identify and express a real or simulated system or a process, based on a limited number of observations of the system's behaviour. The system under study is being characterised by some important control parameters which need to be available for an observer, but usually are difficult to monitor, e.g. they need to be measured in a lab, simulated or observed in real time only, or at high time and computational expenses. Empirical modelling attempts to express these critical control variables via other controllable variables that are easier to monitor, can be measured more accurately or timely, are cheaper to simulate, etc. Symbolic regression provides such expressions of crucial process characteristics, or, response variables, defined (symbolically) as mathematical functions of some of the easy-to-measure input variables, and calls these expressions empirical input-response models. Examples of these are (i) structure-activity relationships in pharmaceutical research, which define the activity of a drug through the physical structure of molecules of drug components, (ii) structure-property relationships in material science, which define product qualities, such as shininess, opacity, smell, or stiffness through physical properties of composites and processing conditions, or (iii) economic models, e.g. expressing return on investment through daily closes of S&P 500 quotes and in ation rates.", abstract = "Industrial modelling problems that are tractable for symbolic regression have two main characteristics: (1) No or little information is known about the underlying system producing the data, and therefore no assumptions on model structure can be made; (2) The available data is high-dimensional, and often not balanced, with either abundant or insufficient number of samples. To discover plausible models with realistic time and computational efforts, symbolic regression exploits a stochastic iterative search technique, based on artificial evolution of model expressions. This method, called genetic programming looks for appropriate expressions of the response variable in the space of all valid formulae containing a minimal set of input variables and a proposed set of basic operators and constants. At each step, the genetic programming system considers a sufficiently large quantity of various formulae, selects the subset of the best formulae according to certain user-defined criteria of goodness, and (re)combines the best formulae to create a rich set of potential solutions for the next step. This approach is inspired by principles of natural selection, where the offspring that inherits good features from both parents increases the chances to be successful in survival, adaptation, and further propagation. The challenge and the rationale of performing evolutionary search is to balance the exploitation of the good solutions discovered so far, with exploration of the new areas of the search space, where even better solutions may be found.", abstract = "The fact that symbolic regression via genetic programming (GP) does not impose any assumptions on the structure of the input-output models means that the model structure is to a large extent determined by data and also by selection objectives used in the evolutionary search. On one hand, it is an advantage and the unique capability compared with other global approximation techniques, since it potentially allows to develop inherently simpler models than, for example, by interpolation with polynomials or spatial correlation analysis. On the other hand, the absence of constraints on model structure is the greatest challenge for symbolic regression since it vastly increases the search space of possible solutions which is already inherently large. A special multi-objective flavour of a genetic programming search is considered, called Pareto GP. Pareto GP used for symbolic regression has strong advantages in creating diverse sets of regression models, satisfying competing criteria of model structural simplicity and model prediction accuracy.", abstract = "This thesis extends the Pareto genetic programming methodology by additional generic model selection and generation strategies that (1) drive the modelling engine to creation of models of reduced non-linearity and increased generalisation capabilities, and (2) improve the effectiveness of the search for robust models by goal softening, adaptive fitness evaluations, and enhanced training strategies. In addition to the new strategies for model development and model selection, this dissertation presents a new approach for analysis, ranking, and compression of given multi-dimensional input-output data for the purpose of balancing the information content in undesigned data sets. To present contributions of this research in the context of real-life problem solving, the dissertation exploits a generic framework of adaptive model-based problem solving used in many industrial modelling applications. This framework consists of an iterative feed-back loop over: (Part I) data generation, analysis and adaptation, (Part II) model development, and (Part III) problem analysis and reduction.", abstract = "Part I of the thesis consists of Chapter 2 and is devoted to data analysis. It studies the ways to balance multi-dimensional input-output data for making further modelling more successful. Chapter 2 proposes several novel methods for interpretation and manipulation of given high-dimensional input-output data such as relative weighting the data, ranking the data records in the order of increasing importance, and accessing the compressibility and information content of a multi-dimensional data set. All methods exploit the geometrical structure of the data and relative distances to nearest-in-the-input space neighbours. All methods treat response values differently, assuming that the data belongs to a response surface, which needs to be identified. Part II of the thesis consist of Chapters 3-7 and addresses the model induction method - Pareto genetic programming. Since time to solution, or, more accurately, time-to-convincing-solution is a major practical challenge of evolutionary search algorithms, and Pareto GP in particular, Part II focuses on algorithmic enhancements of Pareto GP that lead it to the discovery of better solutions faster (i.e. solutions of sufficient quality at a smaller computational effort, or of considerably better quality at the same computational effort).", abstract = " In Chapter 3 a general description of the Pareto GP methodology is presented in a framework of evolutionary search, as an iterative loop over the stages of model generation, model evaluation, and model selection. In Chapter 5 a novel strategy for model selection through explicit non-linearity control is presented. A new complexity measure called the order of non-linearity of symbolic models is introduced and used BibTeX entry too long. Truncated

Genetic Programming entries for Ekaterina (Katya) Vladislavleva