Evolutionary Development of Hierarchical Learning Structures

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Stefan Elfwing and Eiji Uchibe and Kenji Doya and 
                 Henrik I. Christensen",
  title =        "Evolutionary Development of Hierarchical Learning
  journal =      "IEEE Transactions on Evolutionary Computation",
  year =         "2007",
  volume =       "11",
  number =       "2",
  pages =        "249--264",
  month =        apr,
  keywords =     "genetic algorithms, genetic programming, learning
                 (artificial intelligence), Lamarckian evolutionary
                 development, MAXQ hierarchical RL method, foraging
                 task, genetic programming, hierarchical learning
                 structures, hierarchical reinforcement learning, task
  DOI =          "doi:10.1109/TEVC.2006.890270",
  ISSN =         "1089-778X",
  abstract =     "Hierarchical reinforcement learning (RL) algorithms
                 can learn a policy faster than standard RL algorithms.
                 However, the applicability of hierarchical RL
                 algorithms is limited by the fact that the task
                 decomposition has to be performed in advance by the
                 human designer. We propose a Lamarckian evolutionary
                 approach for automatic development of the learning
                 structure in hierarchical RL. The proposed method
                 combines the MAXQ hierarchical RL method and genetic
                 programming (GP). In the MAXQ framework, a subtask can
                 optimise the policy independently of its parent task's
                 policy, which makes it possible to reuse learned
                 policies of the subtasks. In the proposed method, the
                 MAXQ method learns the policy based on the task
                 hierarchies obtained by GP, while the GP explores the
                 appropriate hierarchies using the result of the MAXQ
                 method. To show the validity of the proposed method, we
                 have performed simulation experiments for a foraging
                 task in three different environmental settings. The
                 results show strong interconnection between the
                 obtained learning structures and the given task
                 environments. The main conclusion of the experiments is
                 that the GP can find a minimal strategy, i.e., a
                 hierarchy that minimises the number of primitive
                 subtasks that can be executed for each type of
                 situation. The experimental results for the most
                 challenging environment also show that the policies of
                 the subtasks can continue to improve, even after the
                 structure of the hierarchy has been evolutionary
                 stabilised, as an effect of Lamarckian mechanisms",

Genetic Programming entries for Stefan Elfwing Eiji Uchibe Kenji Doya Henrik I Christensen