Following discussion on Monday 12th December we have agreed that for the month 30 target for the PlayMate scenario ('Kitty') we wish to achieve the following target competences. However for the month 24 demonstration a relatively small subset of this will have to be selected at the January planning meeting.
NOTE:
We are assuming that Marek's basic programmer tools for online control
of the Katana arm will be available, preferably after the Katana
firmware has been modified by Neuronics as requested, to allow suitable
control commands to be sent during motion caused by previous commands.
Exactly what comes out of that will affect the details of what follows.
Here are some of the basic capabilities in Kitty (including perception, motion, construction and use of internal representations, question formation, goal formation), on which other capabilities will be built.
- The robot should be able to see the location and pose of its hand, and to track motion of the hand (the motion may have to be kept slow for this to work).
All this may or may not involve seeing the other parts of the arm, possibly covered in some way to reduce the clutter.- As Nick has pointed out this raises questions about how locations will be represented. This is a large topic, introduced in relation to Fido here, to be specialised to Kitty here. Provisionally it seems likely that all spatial locations will be represented relative to identifiable spatial occupants, and those occupants may have to be identified in one or more maps relative to where the robot is. That raises the question of where the robot is when the robot is a complex object with cameras, body, arm, parts of the arm, all in different places. (Find references, e.g. Arnold Trehub The Cognitive Brain 1991.)
- Initially tracking may just involve continually identifying where the hand is in the scene, and its current direction of motion.
- Later it may be able to track by moving the head to keep the hand in the centre of view
- Still later it may build a description of the path traversed, which might be used, for instance, to reverse the motion.
Example pictures of the hand can be found here
Note that the angle between the final joint (the wrist) and the rest of the arm is fixed at 90 degrees, but the final joint can rotate about the long axis of the preceding joint (like a propeller blade). The fingers can opened and closed but cannot rotate about their long axis in this configuration.NOTE: Other CoSy sites have the Katana arm with the final joint 'inline', which means that the fingers can rotate about their long axis, but they can approach an object only via the plane containing the whole arm and its mounting point.- The robot should be able to send signals to the arm to generate various kinds of motions, and changes of ongoing motion, and should be able to use vision (and possibly other forms of feedback) to observe the effects of the motion and the changes of motion (e.g. altered rate of change of angle of a joint). This may depend on improvements to Katana arm firmware.
- We do not expect to use either conventional forward kinematics or inverse kinematics as (at least initially) there will be no ballistic control. Instead motions will be controlled through feedback, mainly visual feedback. Any ballistic motion capabilities will therefore have to be the result of learning, but that is not yet a requirement for Kitty. (If we make sufficient progress, such learning can be added. Otherwise it will have to come later.
- The robot should have one or more formalisms that it can use internally for representations of the state of the hand (e.g. its position, pose, motion).
The same formalism should be capable of expressing simple factual questions (generated by one of the question-forming transformations discussed in deliverable DR.02.01) and also capable of expressing goal states. Any proposition that can be true or false can specify a goal state -- this includes existentially quantified propositions e.g. 'Some X is held', which leave a choice as to precise goal state.The precise nature of the formalisms remain to be decided. One of them might be first order logic, or possibly a description logic. It is very likely that several different representations of the layout of the table, and the position and motion of the hand will be needed, including intermediate visual representations between the low level visual data and high level scene and process descriptions.
B: Nature vs Nurture
Initially a great deal will be programmed into Kitty so that we have behaviours on which more things can be built.
That will be followed by analysis of competences, concepts, policies, etc. that should be learnt by the robot, followed by design of some (probably simple and illustrative) learning mechanisms which demonstrate how what we had programmed might instead have been learnt. This is still necessarily ill-defined.
Learning to move from here to there
We hope that one of the things the robot can eventually learn is how to move the hand (slowly if necessary) from any inital state to a final goal state under visual control, when there are no obstacles. (Learning to do it ballistically is a different requirement, not part of Kitty's initial specificatin.)
- Ideally the robot should start with
and by experimenting with movements it should learn how to perform the previous task. However, initially this capability will be programmed, to help us investigate the problems.
- A primitive capability to generate movements,
along with- the ability to see its hand,
Different forms of learning might be investigated including
NOTE: the learning mechanisms explored either during Kitty development or before the end of CoSy should include the kinds of chunking-plus-syntactic-composition that we have discussed in connection with altricial species, e.g.
- 'Chunking' mechanisms that store generalised versions of explicitly constructed successful plans.
- Neural nets or other mechanisms that learn extendable mappings between inputs and outputs from examples, e.g. where an input is a combination of
- current state (including perceived hand position and velocity and any proprioceptive feedback) and
- goal state,
and an output is an executable specification of a modification to the current state that brings the robot closer to the goal state.A more sophisticated later version (probably after Kitty) could output a plan or sequence of commands to achieve the goal state. However any such learnt complex action must be executable in interruptable mode, concurrently with visual and other monitoring.
York GC7 paper
Ijcai paper
Child as scientist (search for Selfridge)- During the learning process some goals may be inserted by a human, while others may be automatically generated by an 'autonomous' learning mechanism.
One of the research tasks is to identify kinds of autonomy that could be sensibly given to the robot so that it can take actions without always having to be driven by human commands or questions.We can distinguish two main kinds of innate autonomy:
- Generating and acting on goals during playful exploration of the environment that is part of the 'altricial' learning process. For this we may need Kitty spontaneously to generate goals to
The mechanisms that generate such goals may include preferences for some things as being worth recording because they are 'interesting'. Different classes of interestingness need to be distinguished. E.g. some will be purely 'aesthetic' such as noticing things like repeatibility, symmetry, simplicity. Others may include goals generated by 'surprise', i.e. expectation violations triggering exploratory investigations.
- Look at something
- To perform some action with the arm
- To answer some internally generated question
- To perform a complex task that involves subgoals
- To check whether its ontology needs to be extended
(A by-product of some surprising experience, which could be a failed prediction, a failed action, an unexpectedly successful actin.)
- Protective mechanisms such as an 'alarm' mechanism that detects impending collisions and terminates current movements, or generates an avoidant movement.
Some AI/robotic researchers compensate for the lack of biological needs for food, drink, warmth, etc. by simulating those needs in an artificial way, e.g. having a simulated energy level which is capable of triggering movement towards an energy source when the level gets low. We shall probably find that there are enough interesting effects produced by the preceding kinds of autonomy without having to add this, but we should keep an open mind.
- The goal formalism should interface to a planning system that can generate executable plans to move the hand to a goal state. One form of learning (as in SOAR and other systems) would be storing generalised plans (or plan fragments?) for future use. (One of several things called "chunking".)
- There should be a plan execution system that can execute plans concurrenly with visual monitoring and the capability of modifying or aborting the plan execution at any time.
C: Motion goals for Kitty
The end-of-project CoSy robot should be able to produce motions that involve effects not only on the robot's hand, but also on objects held or pushed and other objects incontact with those objects. During the Kitty phase it may not be possible to achieve that. So our initial Kitty objectives are restricted.D: Manipulation in KittyNOTE: Under some circumstances the view of the hand will be partly or wholly occluded by other parts of the arm. In the long run the robot will have to be able to perform actions that involve going through such states. This may require the use of representations of enduring processes, as discussed in this presentation on vision
- The robot should be able to generate or be given a goal to get the hand close to some location in 3-D space above the table and should be able to move it to that location, using visual control.
- If an obstacle is put in the way during the motion that should be detected and a goal generated to halt the motion. This has strong architectural implications.
- In a more sophisticated version the robot should be able to generate a goal to avoid the new obstacle while continuing the motion if possible, and only halt if that is not possible.
NOTE: It would be interesting to see if that learning (to avoid rather than stop) could be driven by an innate preference not to abandon goals.- There will be some manipulative goals described in the manipulation section.
Whether that should be part of the Kitty project will depend on progress in the rest of this scenario.
Manipulation tasks are partly derived from Jeremy's document regarding tasks for CoSy, and will build on the perceptual and motion capabilities of Kitty described above.Manipulative tasks can be defined as physical actions performed by the robot with its hand in relation to objects in the scene. Examples for Kitty might be
- Pointing: moving the hand to a specified object, or to a specified part of a specified object
- Tracking with the hand: moving the robot's hand to follow a moving object (e.g. something moved by the human, possibly the human's hand itself)
- Applying a force to an object in order to move it, e.g. push it horizontally, rotate it.
- In the longer term we would expect the robot to be able to grasp some objects, pick them up, put them down, etc., performing various different sorts of more complex tasks. However it may not be possible to achieve any of those during the time available for Kitty. Nevertheless the work during the Kitty development period should include some analysis of requirements for such tasks including the planning capabilities.
Add anything produced by Marek in his PhD work. See especially scenarios in Section 5.
(The time scale for those is longer than for Kitty.)
E: Self-understanding in Kitty
Some self-monitoring of external behaviourKitty will have very little self monitoring/recording of internal processes (as needed for learning, for answering questions, for generating some new subgoals, for detecting and dealing with conflicting goals???)
However it should be able to detect that some expectation has been violated, than some goal has been achieved, that something new has been perceived that is potentially more worthy of attention than the preceding focus of attention. Exactly what the consequences of these self-observations are still needs to be worked out. However it is very likely that surprises and failures will be the major drivers of ontology formation.
Instead of arbitrary ostensive definitions (human points and says 'this is a porcupine') we expect new concepts to arise from differentiation and merging based on old concepts, driven by the robot's own observations. This presupposes a suitable framework for the ontological extensions, as discussed here.
F: Natural language use in Kitty There will be two kinds of use of NLP in Kitty
During development and testing it will be useful to be able to interrogate data-structures and processes in Kitty, and to insert goals. For this purpose we need a fairly simple natural language interface, with a grammar designed to cope with a debugging/testing ontology.One way to do this would be to have a process running concurrently with everything else which is 'listening' for keyboard input (later we could add speech but mainly for demonstration purposes). Whenever a new question or command comes in, this will be parsed, transformed into a form that can engage with the internal representations used by CoSy and appropriate action taken. This could be to something like:
Output will not be restricted to verbal communication. In some cases it will be more appropriate to print contents of a data-structure into a file that can be examined later, or direct to the screen. In other cases the verbal question or command will produce a change in the graphical display. In some cses it may initate a special kind of dialogue, e.g. cause a graphical control panel to be displayed. Exactly which kinds of output will be needed in Kitty is a topic for further requirements analysis, and our understand of requirements will expand during development and testing.
- add a new goal for kitty
- change the priority of an existing goal
- remove a goal
- give a direct action command
- change or terminate some some action
- interrogate some data structure and report its contents
- change some data structure (e.g. cause some object to be hallucinated, or correct some mistake produced by the visual system)
- Turn some tracing on or off
F2: Talking to the robot
Use of English for 'real' communication.
TO BE REVISED/EXTENDED
What will Kitty know about the human ?Initially there will be only two roles for the human, neither of which will be understood by Kitty:
Later, when the 'real' NLP interface is functional the human will be an unknown source of some questions and commands.The human will be an unknown cause of certain kinds of observed motions of objects on the table: the robot should notice the motion and react to it in a manner that depends on its task (e.g. following a moving object visually, or with its finger) The human will use the NLP interface for development and testing, and this could include directly changing Kitty's goals or actions. It is unlikely that there will be time to add any collaborative capability to Kitty, though we hope this will be done for CoSy.
Kitty will have no knowledge of the existence of other agents with percepts, desires, goals/intentions, plans, preferences, competences, beliefs, location, body, ...
TO BE REVISED/EXTENDED
H: Putting it all together: architectural requirements for Kitty
Reactive mechanisms, and training for fluency will be missing or minimal (apart from simple alarms and motive generators?)There will be a fairly rich deliberative component with learning at the deliberative level.
Self-monitoring and meta-management will be very primitive e.g. remembering what the robot did and simple self-debugging, explanation capabilities.
Perhaps detection and resolution of conflicts -- including high level attention control?
Concurrent with low-level sensory and motor attention controlVision and motor control will both need layered sub-architectures engaging (concurrently) with different parts of the central architecture
Natural language capabilities will be an add-on, as happened in evolution and in child development.
A partial specification for the architecture is in http://www.cs.bham.ac.uk/research/projects/cosy/matrix/architectures/kitty
Requirements for representations in Kitty are partly in Deliverable 2.1, but further work is needed. More information will be added later e.g. here.
TO BE REVISED/EXTENDED
Most of the non-spatial entities Fido needs to know about will not be known to Kitty.
TO BE REVISED/EXTENDED