URL: http://www.cs.bham.ac.uk/research/projects/cosy/deliverables/matrix/architectures/kitty/kitty-vision-draft.html
Last changed: 11 Jan 2006
In particular in the long run it is crucial for PlayMate to have a good understanding of shape --- far more important than recognising objects, since even a previously unseen object can be manipulated -- e.g. grasped in various ways, pushed, pulled, picked up, turned over, put somewhere, etc., and possibly also stretched, compressed, bent, twisted, etc., all of which involve seeing shapes in motion.
In the short run we simply ignore most of the problems of shape perception by using a small number of known shapes, including the robot's arm and hands. But that must be a temporary simplification.
1. To enable the PlayMate to see the location and orientation of graspable/touchable objects on the table (restricted to a simple class of objects, e.g. rectangular polyhedrons)2. To enable the PlayMate to see the locations and orientations of the main parts of its arm:
The upright post
The two long joints (upper arm, forearm)
The wrist joint
The positions of the two fingers, and the gap between themThis includes seeing relationships between the hand and other things, as required for pointing at, touching, prodding, turning over, etc.
3. To enable the PlayMate to see movements of the arm and fingers, and of objects pushed by the fingers
Initially movements may be represented as either sequences of static states or abstract features of such sequences. In the long run we need to understand vision as primarily perception of process, not structure.
a. Initially, hand-coded 3-D models of all the things to be seen will be provided for the vision system. The form of representation will be chosen so as to interface both to the requirements of model-based vision and to interface with other components, e.g. planning, arm-control.b. Use standard model-based techniques for locating objects in the images (using edge-features, region growing, or other standard image processing techniques to provide evidence for locations of the objects).
c. Use standard mathematical techniques for translating from images projections to 3-D descriptions.
d. Particle filters will be used to deal with uncertainty, and the most likely interpretation will be used as the correct interpretation, though we discussed the possibility of adding some fuzz to the specifications of location and orientation to avoid spurious precision. We also discussed using different coordinate systems for different purposes. E.g. spherical polar coordinates based on the top of the vertical segment of the arm, or based on the camera location, might be useful for some purposes.
e. Initially relations between items (e.g. object to be touched and the hand) will be represented using conventional mathematical representations (e.g. specifying vector from end of fingers to some specified part of the object), though more specialised (e.g. qualitative) representations may be derived from these, as needed for control of movement.
Initially this could be done by restricting all robot actions to sequences of very short movements.
At a later stage (if the Katana firmware is changed as requested) we may be able to generate signals to alter movements while they are in progress. We are likely to require all movements to be very slow for this to work (e.g. giving time for visual processing, etc.) But that is fine in a research project.
NOTE:
One of the hardest problems will be to find good ways of representing
shape, or more generally, surface structure, and its relationships to
various kinds of manipulation actions.
Some researchers attempt to encode shape and affordance information in terms of correlations between motor signals and sensor data. That may suffice for insect intelligence but not for the kind of robot that has a human-like understanding of actions as performed by different individuals or by the same individual in different ways, as explained in this document.
A link to this note will be added to the requirements matrix in the box
input-competences X general
[to be continued]