Symbolic method for deriving policy in reinforcement learning

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Eduard Alibekov and Jiri Kubalik and Robert Babuska",
  booktitle =    "2016 IEEE 55th Conference on Decision and Control
  title =        "Symbolic method for deriving policy in reinforcement
  year =         "2016",
  pages =        "2789--2795",
  abstract =     "This paper addresses the problem of deriving a policy
                 from the value function in the context of reinforcement
                 learning in continuous state and input spaces. We
                 propose a novel method based on genetic programming to
                 construct a symbolic function, which serves as a proxy
                 to the value function and from which a continuous
                 policy is derived. The symbolic proxy function is
                 constructed such that it maximizes the number of
                 correct choices of the control input for a set of
                 selected states. Maximization methods can then be used
                 to derive a control policy that performs better than
                 the policy derived from the original approximate value
                 function. The method was experimentally evaluated on
                 two control problems with continuous spaces, pendulum
                 swing-up and magnetic manipulation, and compared to a
                 standard policy derivation method using the value
                 function approximation. The results show that the
                 proposed method and its variants outperform the
                 standard method.",
  keywords =     "genetic algorithms, genetic programming",
  DOI =          "doi:10.1109/CDC.2016.7798684",
  month =        dec,
  notes =        "Also known as \cite{7798684}",

Genetic Programming entries for Eduard Alibekov Jiri Kubalik Robert Babuska