Cooperative Behavior Acquisition by Learning and Evolution in a Multi-Agent Environment for Mobile Robots

Created by W.Langdon from gp-bibliography.bib Revision:1.4192

  author =       "Eiji Uchibe",
  title =        "Cooperative Behavior Acquisition by Learning and
                 Evolution in a Multi-Agent Environment for Mobile
  school =       "Department of Mechanical Engineering for
                 Computer-Controlled Machinery, Osaka University",
  year =         "1999",
  type =         "Doctor of Engineering",
  month =        jan,
  keywords =     "genetic algorithms, genetic programming",
  URL =          "",
  size =         "134 pages",
  abstract =     "The objective of my research described in this
                 dissertation is to realize learning and evolutionary
                 methods for multiagent systems. This dissertation
                 mainly consists of four parts.

                 We propose a method that acquires the purposive
                 behaviours based on the estimation of the state vectors
                 in Chapter 3. In order to acquire the cooperative
                 behaviors in multiagent environments, each learning
                 robot estimates the Local Prediction Model (hereafter
                 LPM) between the learner and the other objects
                 separately. The LPM estimate the local interaction
                 while reinforcement learning copes with the global
                 interaction between multiple LPMs and the given tasks.
                 Based on the LPMs which satisfies the Markovian
                 environment assumption as possible, robots learn the
                 desired behaviours using reinforcement learning. We
                 also propose a learning schedule in order to make
                 learning stable especially in the early stage of
                 multiagent systems.

                 Chapter 4 discusses how an agent can develop its
                 behaviour according to the complexity of the
                 interactions with its environment. A method for
                 controlling the complexity is proposed for a
                 vision-based mobile robot. The agent estimates the full
                 set of state vectors with the order of the major vector
                 components based on the LPM. The environmental
                 complexity is defined in terms of the speed of the
                 agent while the complexity of the state vector is the
                 number of the dimensions of the state vector. According
                 to the increase of the speed of its own or others, the
                 dimension of the state vector is increased by taking a
                 trade-off between the size of the state space and the
                 learning time.

                 The vector-valued reward function is discussed in order
                 to cope with the multiple tasks in Chapter 5. Unlike
                 the traditional weighted sum of several reward
                 functions, we introduce a discounted matrix to
                 integrate them in order to estimate the value function,
                 which evaluates the current action strategy. Owing to
                 the extension of the value function, the learning agent
                 can estimate the future multiple reward from the
                 environment appropriately.

                 Chapter 6 discusses how multiple robots can emerge
                 cooperative behaviours through co-evolutionary
                 processes. A genetic programming method is applied to
                 individual population corresponding to each robot so as
                 to obtain cooperative and competitive behaviors. The
                 complexity of the problem can be explained twofold:
                 co-evolution for cooperative behaviours needs exact
                 synchronisation of mutual evolutions, and three robot
                 co-evolution requires well-complicated environment
                 setups that may gradually change from simpler to more
                 complicated situations. As an example task, several
                 simplified soccer games are selected to show the
                 validity of the proposed methods. Finally, discussion
                 and concluding remarks on our work are given.",
  notes =        "Thesis Supervisor : Minoru Asada Title : Professor of
                 Graduate School of Engineering, Department of Adaptive
                 Machine Systems, Osaka University Thesis Committee :
                 Minoru Asada, Chair Yoshiaki Shirai Masao Ikeda
                 Copyright 1999 Eiji Uchibe",

Genetic Programming entries for Eiji Uchibe