Using feature construction to avoid large feature spaces in text classification

Created by W.Langdon from gp-bibliography.bib Revision:1.4192

  author =       "Elijah Mayfield and Carolyn Penstein-Rose",
  title =        "Using feature construction to avoid large feature
                 spaces in text classification",
  booktitle =    "GECCO '10: Proceedings of the 12th annual conference
                 on Genetic and evolutionary computation",
  year =         "2010",
  editor =       "Juergen Branke and Martin Pelikan and Enrique Alba and 
                 Dirk V. Arnold and Josh Bongard and 
                 Anthony Brabazon and Juergen Branke and Martin V. Butz and 
                 Jeff Clune and Myra Cohen and Kalyanmoy Deb and 
                 Andries P Engelbrecht and Natalio Krasnogor and 
                 Julian F. Miller and Michael O'Neill and Kumara Sastry and 
                 Dirk Thierens and Jano {van Hemert} and Leonardo Vanneschi and 
                 Carsten Witt",
  isbn13 =       "978-1-4503-0072-8",
  pages =        "1299--1306",
  keywords =     "genetic algorithms, genetic programming, NLP, Natural
                 Language Processing, Text analysis, SVM",
  month =        "7-11 " # jul,
  organisation = "SIGEVO",
  address =      "Portland, Oregon, USA",
  DOI =          "doi:10.1145/1830483.1830714",
  publisher =    "ACM",
  publisher_address = "New York, NY, USA",
  abstract =     "Feature space design is a critical part of machine
                 learning. This is an especially difficult challenge in
                 the field of text classification, where an arbitrary
                 number of features of varying complexity can be
                 extracted from documents as a preprocessing step. A
                 challenge for researchers has consistently been to
                 balance expressiveness of features with the size of the
                 corresponding feature space, due to issues with data
                 sparsity that arise as feature spaces grow larger.
                 Drawing on past successes with genetic programming in
                 similar problems outside of text classification, we
                 propose and implement a technique for constructing
                 complex features from simpler features, and adding
                 these more complex features into a combined feature
                 space which can then be used by more sophisticated
                 machine learning classifiers. Applying this technique
                 to a sentiment analysis problem, we show encouraging
                 improvement in classification accuracy, with a small
                 and constant increase in feature space size. We also
                 show that the features we generate carry far more
                 predictive power than any of the simple features they
  notes =        "Also known as \cite{1830714} GECCO-2010 A joint
                 meeting of the nineteenth international conference on
                 genetic algorithms (ICGA-2010) and the fifteenth annual
                 genetic programming conference (GP-2010)",

Genetic Programming entries for Elijah Mayfield Carolyn Penstein Rose