Revisiting the Acrobot `height' task: An example of Efficient Evolutionary Policy Search under an Episodic Goal Seeking Task

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  title =        "Revisiting the Acrobot `height' task: An example of
                 Efficient Evolutionary Policy Search under an Episodic
                 Goal Seeking Task",
  author =       "John Doucette and Malcolm Heywood",
  pages =        "468--475",
  booktitle =    "Proceedings of the 2011 IEEE Congress on Evolutionary
  year =         "2011",
  editor =       "Alice E. Smith",
  month =        "5-8 " # jun,
  address =      "New Orleans, USA",
  organization = "IEEE Computational Intelligence Society",
  publisher =    "IEEE Press",
  ISBN =         "0-7803-8515-2",
  keywords =     "genetic algorithms, genetic programming, acrobot
                 height task domain, episodic goal seeking task,
                 evolutionary policy search approach, neural evolution
                 of augmented topologies, stochastic sampling heuristic,
                 symbiotic bid based genetic programming, temporal
                 sequence learning problem, training scenarios, learning
                 (artificial intelligence), sampling methods, search
                 problems, stochastic processes, topology",
  DOI =          "doi:10.1109/CEC.2011.5949655",
  abstract =     "Evolutionary methods for addressing the temporal
                 sequence learning problem generally fall into policy
                 search as opposed to value function optimisation
                 approaches. Various recent results have made the claim
                 that the policy search approach is at best inefficient
                 at solving episodic `goal seeking' tasks i.e., tasks
                 under which the reward is limited to describing
                 properties associated with a successful outcome have no
                 qualification for degrees of failure. This work
                 demonstrates that such a conclusion is due to a lack of
                 diversity in the training scenarios. We therefore
                 return to the Acrobot `height' task domain originally
                 used to demonstrate complete failure in evolutionary
                 policy search. This time a very simple stochastic
                 sampling heuristic for defining a population of
                 training configurations is introduced. Benchmarking two
                 recent evolutionary policy search algorithms -- Neural
                 Evolution of Augmented Topologies (NEAT) and Symbiotic
                 Bid-Based (SBB) Genetic Programming -- under this
                 condition demonstrates solutions as effective as those
                 returned by advanced value function methods. Moreover
                 this is achieved while remaining within the evaluation
                 limit imposed by the original study.",
  notes =        "CEC2011 sponsored by the IEEE Computational
                 Intelligence Society, and previously sponsored by the
                 EPS and the IET.",

Genetic Programming entries for John Doucette Malcolm Heywood