URL: http://www.cs.bham.ac.uk/research/projects/cosy/deliverables/matrix/architectures/kitty/kitty-vision-draft.html
Last changed: 11 Jan 2006

DRAFT SPECIFICATION FOR INITIAL VISUAL CAPABILITIES
Initial approach to vision in kitty

Arising from discussions in the Birmingham CoSy team
WARNING: DRAFT -- liable to change.

First draft proposal for work to be done at Bham on vision to support manipulation tasks in Kitty. This will provide a minimal capability. If more sophisticated visual capabilities suitable for Kitty become available they can be used as appropriate.

In particular in the long run it is crucial for PlayMate to have a good understanding of shape --- far more important than recognising objects, since even a previously unseen object can be manipulated -- e.g. grasped in various ways, pushed, pulled, picked up, turned over, put somewhere, etc., and possibly also stretched, compressed, bent, twisted, etc., all of which involve seeing shapes in motion.

In the short run we simply ignore most of the problems of shape perception by using a small number of known shapes, including the robot's arm and hands. But that must be a temporary simplification.

Aims of vision system:

1. To enable the PlayMate to see the location and orientation of graspable/touchable objects on the table (restricted to a simple class of objects, e.g. rectangular polyhedrons)

2. To enable the PlayMate to see the locations and orientations of the main parts of its arm:

The upright post
The two long joints (upper arm, forearm)
The wrist joint
The positions of the two fingers, and the gap between them

This includes seeing relationships between the hand and other things, as required for pointing at, touching, prodding, turning over, etc.

3. To enable the PlayMate to see movements of the arm and fingers, and of objects pushed by the fingers

Initially movements may be represented as either sequences of static states or abstract features of such sequences. In the long run we need to understand vision as primarily perception of process, not structure.

Method:

a. Initially, hand-coded 3-D models of all the things to be seen will be provided for the vision system. The form of representation will be chosen so as to interface both to the requirements of model-based vision and to interface with other components, e.g. planning, arm-control.

b. Use standard model-based techniques for locating objects in the images (using edge-features, region growing, or other standard image processing techniques to provide evidence for locations of the objects).

c. Use standard mathematical techniques for translating from images projections to 3-D descriptions.

d. Particle filters will be used to deal with uncertainty, and the most likely interpretation will be used as the correct interpretation, though we discussed the possibility of adding some fuzz to the specifications of location and orientation to avoid spurious precision. We also discussed using different coordinate systems for different purposes. E.g. spherical polar coordinates based on the top of the vertical segment of the arm, or based on the camera location, might be useful for some purposes.

e. Initially relations between items (e.g. object to be touched and the hand) will be represented using conventional mathematical representations (e.g. specifying vector from end of fingers to some specified part of the object), though more specialised (e.g. qualitative) representations may be derived from these, as needed for control of movement.

NOTE

Further research will be needed to identify good forms of representation of things, locations, and movements suitable for use in an action control system. In particular, from the output of the vision system we shall need to derive control signals for the arm, given a certain goal. I don't think we want to do this using inverse kinematics techniques: rather the control signals initially should generate movements that reduce discrepancies between current and desired position.

Initially this could be done by restricting all robot actions to sequences of very short movements.

At a later stage (if the Katana firmware is changed as requested) we may be able to generate signals to alter movements while they are in progress. We are likely to require all movements to be very slow for this to work (e.g. giving time for visual processing, etc.) But that is fine in a research project.

NOTE:
One of the hardest problems will be to find good ways of representing shape, or more generally, surface structure, and its relationships to various kinds of manipulation actions.

Some researchers attempt to encode shape and affordance information in terms of correlations between motor signals and sensor data. That may suffice for insect intelligence but not for the kind of robot that has a human-like understanding of actions as performed by different individuals or by the same individual in different ways, as explained in this document.

A link to this note will be added to the requirements matrix in the box

input-competences X general

[to be continued]


Use 'back' button to return to matrix