Genetic Programming for Reward Function Search

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Scott Niekum and Andrew G. Barto and Lee Spector",
  title =        "Genetic Programming for Reward Function Search",
  journal =      "IEEE Transactions on Autonomous Mental Development",
  year =         "2010",
  month =        jun,
  volume =       "2",
  number =       "2",
  pages =        "83--90",
  abstract =     "Reward functions in reinforcement learning have
                 largely been assumed to be given as part of the problem
                 being solved by the agent. However, the psychological
                 notion of intrinsic motivation has recently inspired
                 inquiry into whether there exist alternate reward
                 functions that enable an agent to learn a task more
                 easily than the natural task-based reward function
                 allows. We present a genetic programming algorithm to
                 search for alternate reward functions that improve
                 agent learning performance. We present experiments that
                 show the superiority of these reward functions,
                 demonstrate the possible scalability of our method, and
                 define three classes of problems where reward function
                 search might be particularly useful: distributions of
                 environments, nonstationary environments, and problems
                 with short agent lifetimes.",
  keywords =     "genetic algorithms, genetic programming, agent
                 learning performance, genetic programming algorithm,
                 intrinsic motivation, nonstationary environment,
                 psychological notion, reinforcement learning, task
                 based reward function, learning (artificial
  DOI =          "doi:10.1109/TAMD.2010.2051436",
  ISSN =         "1943-0604",
  notes =        "Hungry-Thirsty 6 by 6 closed square grid world
                 food/water sources can only be found in two of the four
                 corners (12 possibilities). Q-learning, GP teaching
                 'shaping' function applied as addition to usual RL
                 reward scheme. Markov. Single agent in one of
                 2*2*(2**36) states? GP has (may have) agent's hunger,
                 thirst, x,y co-ordinates, noise. Float only (much like
                 ordinary tree GP rather than PushGP). Although
                 statistically significant GP improved agents increase
                 seems slight in static case but more (Fig 4) in dynamic
                 cases and experiments when the agents are short

                 Java implementation of PushGP called Psh

                 Also known as \cite{5473118}",

Genetic Programming entries for Scott Niekum Andrew G Barto Lee Spector