Learning Discriminative Representations from RGB-D Video Data

Created by W.Langdon from gp-bibliography.bib Revision:1.4208

  author =       "Li Liu and Ling Shao",
  title =        "Learning Discriminative Representations from {RGB-D}
                 Video Data",
  booktitle =    "Proceedings of the Twenty-Third International Joint
                 Conference on Artificial Intelligence",
  year =         "2013",
  pages =        "1493--1500",
  address =      "Beijing, China",
  publisher =    "AAAI Press",
  keywords =     "genetic algorithms, genetic programming",
  isbn13 =       "978-1-57735-633-2",
  URL =          "http://dl.acm.org/citation.cfm?id=2540128.2540343",
  acmid =        "2540343",
  oai =          "oai:CiteSeerX.psu:",
  URL =          "http://citeseerx.ist.psu.edu/viewdoc/summary?doi=",
  URL =          "http://ijcai.org/papers13/Papers/IJCAI13-223.pdf",
  size =         "8 pages",
  abstract =     "Recently, the low-cost Microsoft Kinect sensor, which
                 can capture real-time high-resolution RGB and depth
                 visual information, has attracted increasing attentions
                 for a wide range of applications in computer vision.
                 Existing techniques extract hand-tuned features from
                 the RGB and the depth data separately and heuristically
                 fuse them, which would not fully exploit the
                 complementarity of both data sources. In this paper, we
                 introduce an adaptive learning methodology to
                 automatically extract (holistic) spatio-temporal
                 features, simultaneously fusing the RGB and depth
                 information, from RGB-D video data for visual
                 recognition tasks. We address this as an optimisation
                 problem using our proposed restricted graph-based
                 genetic programming (RGGP) approach, in which a group
                 of primitive 3D operators are first randomly assembled
                 as graph-based combinations and then evolved generation
                 by generation by evaluating on a set of RGB-D video
                 samples. Finally the best-performed combination is
                 selected as the (near-)optimal representation for a
                 pre-defined task.

                 The proposed method is systematically evaluated on a
                 new hand gesture dataset, SKIG, that we collected
                 ourselves and the public MSR Daily Activity 3D dataset,
                 respectively. Extensive experimental results show that
                 our approach leads to significant advantages compared
                 with state-of-the-art hand-crafted and machine-learnt

Genetic Programming entries for Li Liu Ling Shao