Computation Approaches for Continuous Reinforcement Learning Problems

Created by W.Langdon from gp-bibliography.bib Revision:1.4524

  author =       "Dimitrios Effraimidis",
  title =        "Computation Approaches for Continuous Reinforcement
                 Learning Problems",
  school =       "Department of Computer Science, University of
  year =         "2016",
  address =      "UK",
  month =        sep,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  URL =          "",
  size =         "127 pages",
  abstract =     "Optimisation theory is at the heart of any control
                 process, where we seek to control the behaviour of a
                 system through a set of actions. Linear control
                 problems have been extensively studied, and optimal
                 control laws have been identified. But the world around
                 us is highly non-linear and unpredictable. For these
                 dynamic systems, which don't possess the nice
                 mathematical properties of the linear counterpart, the
                 classic control theory breaks and other methods have to
                 be employed. But nature thrives by optimising
                 non-linear and over-complicated systems. Evolutionary
                 Computing (EC) methods exploit nature's way by
                 imitating the evolution process and avoid to solve the
                 control problem analytically.

                 Reinforcement Learning (RL) from the other side regards
                 the optimal control problem as a sequential one. In
                 every discrete time step an action is applied. The
                 transition of the system to a new state is accompanied
                 by a sole numerical value, the reward that designate
                 the quality of the control action. Even though the
                 amount of feedback information is limited into a sole
                 real number, the introduction of the Temporal
                 Difference method made possible to have accurate
                 predictions of the value-functions. This paved the way
                 to optimise complex structures, like the Neural
                 Networks, which are used to approximate the value

                 In this thesis we investigate the solution of
                 continuous Reinforcement Learning control problems by
                 EC methodologies. The accumulated reward of such
                 problems throughout an episode suffices as information
                 to formulate the required measure, fitness, in order to
                 optimise a population of candidate solutions.
                 Especially, we explore the limits of applicability of a
                 specific branch of EC, that of Genetic Programming
                 (GP). The evolving population in the GP case is
                 comprised from individuals, which are immediately
                 translated to mathematical functions, which can serve
                 as a control law.

                 The major contribution of this thesis is the proposed
                 unification of these disparate Artificial Intelligence
                 paradigms. The provided information from the systems
                 are exploited by a step by step basis from the RL part
                 of the proposed scheme and by an episodic basis from
                 GP. This makes possible to augment the function set of
                 the GP scheme with adaptable Neural Networks. In the
                 quest to achieve stable behaviour of the RL part of the
                 system a modification of the Actor-Critic algorithm has
                 been implemented.

                 Finally we successfully apply the GP method in
                 multi-action control problems extending the spectrum of
                 the problems that this method has been proved to solve.
                 Also we investigated the capability of GP in relation
                 to problems from the food industry. These type of
                 problems exhibit also non-linearity and there is no
                 definite model describing its behaviour.",

Genetic Programming entries for Dimitrios Effraimidis