Discovering Time Oriented Abstractions in Historical Data to Optimize Decision Tree Classification

Created by W.Langdon from gp-bibliography.bib Revision:1.3973

@InCollection{masand:1996:aigp2,
  author =       "Brij Masand and Gregory Piatesky-Shapiro",
  title =        "Discovering Time Oriented Abstractions in Historical
                 Data to Optimize Decision Tree Classification",
  booktitle =    "Advances in Genetic Programming 2",
  publisher =    "MIT Press",
  year =         "1996",
  editor =       "Peter J. Angeline and K. E. {Kinnear, Jr.}",
  pages =        "489--498",
  chapter =      "24",
  address =      "Cambridge, MA, USA",
  keywords =     "genetic algorithms, genetic programming",
  ISBN =         "0-262-01158-1",
  URL =          "http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6277523",
  size =         "10 pages",
  abstract =     "This paper explores the synergy between OP and
                 decision tree-based classification. We are addressing
                 the problem of identifying 'good' customers (e.g those
                 who respond to special offers) by analysing historical
                 customer billing data, using decision tree classifiers
                 such as C4.5 [Quinlan 1993) and optimising that
                 performance using OP [Koza 1992]. One difficult issue
                 is how to transform and abstract raw historical data
                 from several months for the purpose of analysis. We ad
                 dress this by using OP to discover time oriented data
                 abstractions of data. that enable improved prediction
                 performance. than possible with the raw data alone. We
                 also contrast the performance improvement obtained by
                 generating random populations with comparable
                 computational effort vs. OP evolution on smaller
                 populations. Using C4.5 alone we are able to get a
                 prediction error of about 38percent (on a 50-50percent
                 test set of non-responders/responders) Using the
                 additional derived fields from raw billing data, we are
                 able to reduce the error to 35.9percent, a significant
                 reduction for this domain. Each 1percent of improved
                 performance (on real data) is worth about $1 million in
                 potential increased revenues.",
}

Genetic Programming entries for Brij Masand Gregory Piatesky-Shapiro

Citations