Last changed: 23 Jan 2006
Kitty is notionally a 30 month deliverable. However given the way the EC operates, we are expected to have a working version by about 24 Months, i.e. September/October 2005. So there is very little time. Since none of the first year implementation effort was devoted to manipulation (although a great deal of effort was put into analysing requirements for manipulation) we are starting from a very backward position (including waiting for changes to the firmware of the Katana arm that will allow us to develop the required arm control software for a robot that can react to changes as they occur.)
Because of this 'backward starting point' for the work on manipulation, the 'core' specification of Kitty will be very simple at first, so as to allow us to be reasonably confident of having a working system for a 'minimal' demonstration by September, as specified in Jeremy's email of 30th Nov 05.
Additional functionality may be specified but without any commitment to its working in time for the next major demonstration, though with the aim of delivering by month 30.
In any case the initial core architecture must be designed so that it is extendable later on to provide the functionality for month 30 and beyond.
We expect to use a rapid-prototyping methodology, which implies going rapidly for a first draft that may be entirely discarded and a new draft specified on the basis of what we have learnt. For that initial implementation we shall therefore use tools that speed up development and testing, disregarding long term efficiency. This is a well tested software engineering strategy.
1.a The current (episodic, situational) knowledge base:
(Sometimes called 'instance memory'. Things in here are treated as facts about the current and past environment.)
- what is in the world (= on the table)
- where the things are
- what their current relations are
- what their current movements are
- the status of current goals and actions
- current and recent state of arm reported by arm controller
- current and recent state of visual system
- current and recent linguistic inputs
(constantly inserted by linguistic interface)
- Other more general information about recent previous states
(needed later for learning.)
- Map like representation of current local environment, including current view and beyond.
- Perhaps more general map of more remote environment.
- Records of changes to maps
1.b Temporary store for hypotheticals(e.g. used for possible explanations, or thinking about 'what if' questions, or tentative predictions, or hypotheses under test)
[Could be implemented in the instance memory with special tags, or in a separate memory ]
1.c. Temporary store for unanswered questions
1.d. General knowledge about what is possible in the world
(sometimes called 'semantic knowledge')
- everything about the ontology (as defined in deliverable DR 2.1 that is explicitly formulated would be included here. (Some ontological assumptions may be implicit in the operation of sub-mechanisms.)
- knowledge of types of things that can be on the table, and types of motions that can occur
- consequences of various kinds of movements how relationships change
- how other things may be caused to move
For Jeremy's experiments this will have to include actions like applying a force
low level image processor providing information about
edges and regions.
(In some circumstances it may be useful to have an attention control mechanism that selects regions of the image to process.)
using appropriate mixture of top-down and bottom-up processing derive instances of the models representing perceived objects, and their locations, poses, etc.
using sequences of images derive low level descriptions of what is moving and and how it is moving (in 3-D)
(Later on: try to infer relationships partly from image clues,
e.g. occlusion cues, instead of leaving relations to be computed from model instances.
This could include things like time to contact.)
translator from internal representations of vision system to formalism of episodic memory.
Adding camera controller will require changes to other parts of the system to detect need to alter field of view, and decide how to alter it, etc. See motive generator, planner.)
Something that detects a problem (impending unwanted collision, or limitation of arm movement preventing intended effect) then generates alarm signal(s).
Initially this could just do two things:(a) cause all motion to freeze
(b) insert information about the alarm and action taken into the database.
Later it may be able to distinguish immediately urgent problems and potential problems. The latter would not generate action immediately (e.g. freezing) but might add some factual information to the episodic memory and generate a goal to deal with the situation.
NOTE: inputs from everywhere, outputs to everywhere (potentially)
Alarm mechanisms need to be fast, using rapid pattern matching, and should be trainable.
The requirement for speed means there will sometimes be mistakes.
Later we may have a mechanism that monitors current state of affairs
and generates goals autonomously (e.g.
find out why X did not happen? find out
if something will get in the way of planned movement, test an
This requires the robot to have criteria for 'interestingness' of different sorts of percepts, actions, and goals.
Still later goals will be generated mainly by a learning system (as in 'altricial' species that learn by playing and exploring)
This may include semi-randomly generating questions to answer using a question-generator of the sort discussed in deliverable DR 2.1
A deliberative system is one that can do reasoning or planning making use of structured representations of hypothetical states of affairs or actions, which may be compared, combined, and selected as the solution to a problem.
These representations may be of many forms, e.g. propositional, action sequence descriptions, or analogical (e.g. using maps, diagrams) or neural nets.
- The simplest case is a planner, i.e. a mechanism for deriving an action plan from current situation and current main goal (action plans may be more or less complicated, and the planner may allow barge-in, i.e. anytime planning.
- There may also be mechanisms for answering questions like 'What would you have done if...?' or 'What would have happened if you had done X...?'
Including one global working memory
This can include deciding when to abandon a plan, or to modify a plan when new information arises.
This may also have a mode in which it continually spouts a running commentary on what it is doing -- both what its external actions are and what it is doing internally. Later that sub-system may need to be made more intelligent regarding what is worth reporting.
Including another global working memory ?
This takes current top level goal and associated plans and sends appropriate action commands and state queries to the arm controller and also constantly checks state as reported by visual system.
(Later it may also send commands to camera motors.)
More fast and fluent action will require more direct coupling between vision system and motor control system. That could be a product of learning in a later stage of the project.
Details not yet specified but something like the ACT-R spreading activation mechanism will be implemented using something like Hebbian learning to produce a 'context' mechanism, with a decay mechanism to implement some aspects of short term memory.
This could be a form of attention control and a mechanism for simple serendipitous learning.
Note that for planning purposes it will be necessary also to have explicit generalisations about preconditions and consequences of actions and events.
There may need to be some high level control mechanisms that can alter parameters in the network - e.g. sensitivity, thresholds, decay rates.
Examples of relevant learning tasks would include
and many more....
and many more....
Learning mechanisms will use current and past information, plus task information to draw conclusions about what causes what and under what conditions, perhaps learning about both causation as a network of conditional proabilities (Humean Causation) and causation as necessary consequences of changing structural relations when complex structures interact, subject to constraints like rigidity and impenetrability (Kantian Causation). The distinction is explored here (COSY-PR-0506) and some of the ideas are elaborated here (COSY-DP-0601)
This can start with a mixture of low level programming commands, and graphical tools, possibly extended to allow a simplified natural language interface to be used, e.g. to insert goals, interrogate data-structures, etc. send low level commands to subsystems.
This should be able to take information from current databases and translate it into sentences, to be typed on terminal or fed to speech synthesiser.
It should also have a language input controller which constantly waits either for typed or spoken input, parses and interprets it and adds the results to the database.
Initially there may be only simple commands, questions and assertions, but later (for the 'Philosopher' scenario) we want to have warnings, interrupts, advice (during reasoning, acting, or planning), discussion of the robots thought processes and perceptual experiences, and discussions about hypothetical situations, to test the robot's self-understanding.
The NLP system will start with some innate information about language including syntax and semantics of terms relating to the robot's world. At a later stage we'll consider ways of learning a language from scratch. (A huge amount of literature exists regarding how this might be done and there are major controversies. We may have to set up a seminar to decide how to take sides in the controversies in order best to serve CoSy.)
The NLP system will need a quite complex sub-architecture not represented here.
(See GJ's architectures).
linking declarative memories to other things
Later -- investigate ways of 'managing' the spreading activation.
Once the arm control system and the visual system are working
it would be fairly simple to put together the whole thing using
SimAgent to test out the ideas in a rapid prototyping environment.
Even before they are working it may be useful to simulate them in a simplified form to test out aspects of the architecture.
Once we have a proof of concept it can be re-implemented in a preferred framework, after we have re-evaluated the various options (CORBA, MARIE, CARMEN, etc.)
[to be continued, modified, corrected, implemented]