Data Mining Approaches to Complex Environmental Problems

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "David J. Hill",
  title =        "Data Mining Approaches to Complex Environmental
  school =       "Environmental Engineering in Civil Engineering,
                 University of Illinois at Urbana-Champaign",
  year =         "2007",
  address =      "Urbana, Illinois, USA",
  month =        "23 " # jul,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  size =         "195 pages",
  abstract =     "Understanding and predicting the behaviour of
                 large-scale environmental systems is necessary for
                 addressing many challenging problems of environmental
                 interest. Unfortunately, the challenge of scaling
                 predictive models, as well as the difficulty of
                 parametrise these models, makes it difficult to apply
                 them to large-scale systems. This research addresses
                 these issues through the use of data mining.
                 Specifically, this dissertation addresses two problems:
                 upscaling models of solute transport in porous media
                 and detecting anomalies in streaming environmental

                 Up scaling refers to the creation of models that do not
                 need to explicitly resolve all scales of system
                 heterogeneity. Upscaled models require significantly
                 fewer computational resources than do models that
                 resolve small-scale heterogeneity. This research
                 develops an upscaling method based on genetic
                 programming (GP), which facilitates both the GP search
                 and the implementation of the resulting models, and
                 demonstrates its use and efficacy through a case

                 Anomaly detection is the task of identifying data that
                 deviate from historical patterns. It has many practical
                 applications, such as data quality assurance and
                 control (QA/QC), focused data collection, and event
                 detection. The second portion of this dissertation
                 develops a suite of data-driven anomaly detection
                 methods, based on autoregressive datadriven models
                 (e.g. artificial neural networks) and dynamic Bayesian
                 network (DBN) models of the sensor data stream. All of
                 the developed methods perform fast, incremental
                 evaluation of data as it becomes available; scale to
                 large quantities of data; and require no a priori
                 information, regarding process variables or types of
                 anomalies that may be encountered. Furthermore, the
                 methods can be easily deployed on large heterogeneous
                 sensor networks. The anomaly detection methods are then
                 applied to a sensor network located in Corpus Christi
                 Bay, Texas, and their abilities to identify both real
                 and synthetic anomalies in meteorological data are
                 compared. Results of these case studies indicate that
                 DBN-based detectors, using either robust Kalman
                 filtering or Rao-Blackwellized particle filtering, are
                 most suitable for the Corpus Christi meteorological

Genetic Programming entries for David Hill