A general method for incremental self-improvement and multiagent learning

Created by W.Langdon from gp-bibliography.bib Revision:1.4420

  author =       "Juergen Schmidhuber",
  title =        "A general method for incremental self-improvement and
                 multiagent learning",
  booktitle =    "Evolutionary Computation: Theory and Applications",
  publisher =    "Scientific Publ. Co.",
  year =         "1999",
  editor =       "X. Yao",
  chapter =      "3",
  pages =        "81--123",
  address =      "Singapore",
  keywords =     "genetic algorithms, genetic programming",
  URL =          "ftp://ftp.idsia.ch/pub/juergen/xinbook.pdf",
  URL =          "http://www.idsia.ch/~juergen/xinbook/",
  abstract =     "I describe a novel paradigm for reinforcement learning
                 (RL) with limited computational resources in realistic,
                 non-resettable environments. The learner's policy is an
                 arbitrary modifiable algorithm mapping environmental
                 inputs and internal states to outputs and new internal
                 states. Like in the real world, any event in system
                 life and any learning process computing policy
                 modifications may affect future performance and
                 preconditions of future learning processes. There is no
                 need for pre-defined trials. At a given time in system
                 life, there is only one single training example to
                 evaluate the current long-term usefulness of any given
                 previous policy modification, namely the average
                 reinforcement per time since that modification
                 occurred. At certain times in system life called
                 checkpoints, such singular observations are used by a
                 stack-based backtracking method which invalidates
                 certain previous policy modifications, such that the
                 history of still valid modifications corresponds to a
                 history of long-term reinforcement accelerations (up
                 until to the current checkpoint, each still valid
                 modification has been followed by faster reinforcement
                 intake than all the previous ones). Until the next
                 checkpoint there is time to collect delayed
                 reinforcement and to execute additional policy
                 modifications; until then no previous policy
                 modifications are invalidated; and until then the
                 straight-forward, temporary generalization assumption
                 is: each modification that until now appeared to
                 contribute to an overall speed-up will remain useful.
                 The paradigm provides a foundation for (1)
                 meta-learning, and (2) multi-agent learning. The
                 principles are illustrated in (1) a single,
                 self-referential, evolutionary system using an
                 assembler-like programming language to modify its own
                 policy, and to modify the way it modifies its policy,
                 etc., and (2) another evolutionary system consisting of
                 multiple agents, where each agent is in fact just a
                 connection in a fully recurrent RL neural net.",

Genetic Programming entries for Jurgen Schmidhuber