Genetic programming for feature construction and selection in classification on high-dimensional data

Created by W.Langdon from gp-bibliography.bib Revision:1.3973

@Article{journals/memetic/TranXZ16,
  author =       "Binh Tran and Bing Xue and Mengjie Zhang",
  title =        "Genetic programming for feature construction and
                 selection in classification on high-dimensional data",
  journal =      "Memetic Computing",
  year =         "2016",
  number =       "1",
  volume =       "8",
  keywords =     "genetic algorithms, genetic programming",
  bibdate =      "2016-02-17",
  bibsource =    "DBLP,
                 http://dblp.uni-trier.de/db/journals/memetic/memetic8.html#TranXZ16",
  pages =        "3--15",
  URL =          "http://dx.doi.org/10.1007/s12293-015-0173-y",
  DOI =          "doi:10.1007/s12293-015-0173-y",
  month =        mar,
  keywords =     "genetic algorithms, genetic programming, Feature
                 construction, Feature selection, Classification,
                 High-dimensional data",
  ISSN =         "1865-9284",
  abstract =     "Classification on high-dimensional data with thousands
                 to tens of thousands of dimensions is a challenging
                 task due to the high dimensionality and the quality of
                 the feature set. The problem can be addressed by using
                 feature selection to choose only informative features
                 or feature construction to create new high-level
                 features. Genetic programming (GP) using a tree-based
                 representation can be used for both feature
                 construction and implicit feature selection. This work
                 presents a comprehensive study to investigate the use
                 of GP for feature construction and selection on
                 high-dimensional classification problems. Different
                 combinations of the constructed and/or selected
                 features are tested and compared on seven
                 high-dimensional gene expression problems, and
                 different classification algorithms are used to
                 evaluate their performance. The results show that the
                 constructed and/or selected feature sets can
                 significantly reduce the dimensionality and maintain or
                 even increase the classification accuracy in most
                 cases. The cases with overfitting occurred are analysed
                 via the distribution of features. Further analysis is
                 also performed to show why the constructed feature can
                 achieve promising classification performance.",
}

Genetic Programming entries for Binh Tran Bing Xue Mengjie Zhang

Citations