Contributions to Simulation-based High-dimensional Sequential Decision Making

Created by W.Langdon from gp-bibliography.bib Revision:1.3872

  author =       "Jean-Baptiste Hoock",
  title =        "Contributions to Simulation-based High-dimensional
                 Sequential Decision Making",
  school =       "Universit{\'e} Paris Sud - Paris XI",
  address =      "France",
  year =         "2013",
  month =        apr # "~10",
  keywords =     "genetic algorithms, genetic programming, GP, computer
                 science/other, informatique/autre, Monte Carlo tree
                 search, learning from simulations, high-dimensional
                 sequential decision making, games, planning, Markov
                 decision process, MoGo, MASH",
  bibsource =    "OAI-PMH server at",
  identifier =   "2013PA112053",
  language =     "ENG",
  oai =          "",
  URL =          "",
  URL =          "",
  URL =          "",
  size =         "238 pages",
  abstract =     "My thesis is entitled Contributions to
                 Simulation-based High-dimensional Sequential Decision
                 Making. The context of the thesis is about games,
                 planning and Markov Decision Processes.

                 An agent interacts with its environment by successively
                 making decisions. The agent starts from an initial
                 state until a final state in which the agent can not
                 make decision anymore. At each time-step, the agent
                 receives an observation of the state of the
                 environment. From this observation and its knowledge,
                 the agent makes a decision which modifies the state of
                 the environment. Then, the agent receives a reward and
                 a new observation. The goal is to maximise the sum of
                 rewards obtained during a simulation from an initial
                 state to a final state. The policy of the agent is the
                 function which, from the history of observations,
                 returns a decision.

                 We work in a context where (i) the number of states is
                 huge, (ii) reward carries little information, (iii) the
                 probability to reach quickly a good final state is weak
                 and (iv) prior knowledge is either nonexistent or
                 hardly exploitable. Both applications described in this
                 thesis present these constraints : the game of Go and a
                 3D simulator of the European project MASH (Massive Sets
                 of Heuristics).

                 In order to take a satisfying decision in this context,
                 several solutions are brought :

                 1. Simulating with the compromise
                 exploration/exploitation (MCTS)

                 2. Reducing the complexity by local solving

                 3. Building a policy which improves itself (RBGP)

                 4. Learning prior knowledge (CluVo+GMCTS)

                 Monte-Carlo Tree Search (MCTS) is the state of the art
                 for the game of Go. From a model of the environment,
                 MCTS builds incrementally and asymmetrically a tree of
                 possible futures by performing Monte-Carlo simulations.
                 The tree starts from the current observation of the
                 agent. The agent switches between the exploration of
                 the model and the exploitation of decisions which
                 statistically give a good cumulative reward. We discuss
                 2 ways for improving MCTS : the parallelisation and the
                 addition of prior knowledge.

                 The parallelisation does not solve some weaknesses of
                 MCTS; in particular some local problems remain
                 challenges. We propose an algorithm (GoldenEye) which
                 is composed of 2 parts : detection of a local problem
                 and then its resolution. The algorithm of resolution
                 reuses some concepts of MCTS and it solves difficult
                 problems of a classical database.

                 The addition of prior knowledge by hand is laborious
                 and boring. We propose a method called Racing-based
                 Genetic Programming (RBGP) in order to add
                 automatically prior knowledge. The strong point is that
                 RBGP rigorously validates the addition of a prior
                 knowledge and RBGP can be used for building a policy
                 (instead of only optimising an algorithm).

                 In some applications such as MASH, simulations are too
                 expensive in time and there is no prior knowledge and
                 no model of the environment; therefore Monte-Carlo Tree
                 Search can not be used. So that MCTS becomes usable in
                 this context, we propose a method for learning prior
                 knowledge (CluVo). Then we use pieces of prior
                 knowledge for improving the rapidity of learning of the
                 agent and for building a model, too. We use from this
                 model an adapted version of Monte-Carlo Tree Search
                 (GMCTS). This method solves difficult problems of MASH
                 and gives good results in an application to a word
  notes =        "p116 noisy GP progress rate. Hoeffding and Bernstein

Genetic Programming entries for Jean-Baptiste Hoock