URL: http://www.cs.bham.ac.uk/research/projects/cosy/deliverables/matrix/general-input/axs-fido.html
Last changed: 10 Dec 2005

General Issues about Perception in Fido
And some comments on CoSy's requirements

Fido, the domestic robot of the long term future, described here, will have several forms of perception, including vision, hearing, touch, proprioception, temperature, and probably a host of internal sensors monitoring physical needs. It will probably not need taste for its own purposes (except in some wildly futuristic scenarios), but it may be useful for a domestic robot to do taste testing if it is preparing food for someone who is disabled.

The end-of-project robots in CoSy (PlayMate and Explorer) will have at most

In contrast, Fido may have powerful new sensing devices, including cameras with much higher resolution and framerates than now, and vast amounts of computing power and processing memory all available in compact, light weight physical packaging.

The functions of the various perceptual systems in Fido will need to be derived from careful analysis of requirements in diverse scenarios, and many of the requirements will not be obvious in advance. For example, humans can hear someone approaching a door from the corridor outside. Will Fido's hearing need to include that sort of capability? Humans can understand a speaker or their own language of almost any age, either gender, in a variety of emotional states, including shouting in anger, whispering in order not to wake the baby, or speaking while sobbing. Will we require that sort of flexibility in a domestic robot, or will it suffice to train it to cope with a limited range of speakers speaking with some care?

In CoSy we probably cannot expect to have unrestricted speech input, and for some tasks may need to use typing or a mixture of typing and graphical pointing via the screen.

Vision

One of the hardest problems for the design of intelligent human-like systems is to be clear what the functions of vision are. The vast majority of vision researchers (if not all) seem to make very specific assumptions about the role of vision.

  1. Some people focus on segmentation, recognition and classification, where that means attaching labels to images or portions of images as a result of some sort of training, often without any attempt to interpret anything in terms of 3-D structure. That is one of several applications of statistical methods, which also include tracking or prediction of what is likely to happen next.

    We can certainly assume that Fido will have to be able to do those things, but they are relatively minor (shallow) aspects of vision, and far more will be required for a human-like or animal-like visual system, especially capabilities related to spatial structure.

  2. Others such as David Marr assumed that the function of vision was to segment a scene into 3-D objects and find their locations, pose, shape, motion, colour, texture and possibly other physical properties and relationships. This included finding the orientations of visible surfaces.

  3. J.J. Gibson produced a theory that implied that the functions of vision were much less concerned with information about objective, observer-independent physical properties. Instead he emphasised affordances, which are different for different kinds of animals, and might be different at different times, even for the same animal. On this view the function of vision in an animal (and presumably a robot) is to provide information about what sorts of actions that the animal is capable of performing that might be relevant to its goals or preferences are and are not possible or what needs to be done to make then possible. On this view the information provided by a vision (or other perception) system is not simply about the contents of the environment but about relations between at least the following

  4. Those who emphasise a 'dynamical systems' approach to the study of intelligence emphasise the role of perception, and vision in particular, in fine-grained or continuous control of actions, including catching things, avoiding things, lifting things, throwing things, etc. Where control is continuous differential equations may be used to represent relationships between sensory states and motor control states.

  5. Philosophers have thought of vision and other senses as providing information relevant to generating and testing beliefs about regularities in the environment, e.g. unsupported objects fall, things that look like apples are good to eat, the sun goes round the earth, ... Less sophisticated versions focus only on correlations between different sensory data. More sophisticated versions of this assume that perception, possibly augmented with additional devices such as measuring instruments, can be relevant to far more abstract theories, e.g. about the relations between different physical forces, or the atomic structure of different substances.

  6. Another use that is part of our everyday life is communication: we read many written statements, questions, stories, instructions, equations, tables of numbers, etc. We also see many other things as representing some meaning, including maps, flow-charts, diagrams in proofs, etc. We also visually see intentional gestures as communications.

  7. Closely related to the previous point is our ability to see states of other minds by 'reading' involuntary expressions, including facial expressions, postures and various forms of movement and eye gaze. It is significant that we see, rather than merely infer, happiness, sadness and other mental states. That presupposes that the visual system has access to representations of non-physical phenomena.
    faces

  8. A quite different abstract function of perception is to provide understanding of how things work. This happens for example when a clock is opened up and we see how the various movements are causally linked. This kind of function of perception is discussed in this presentation on kinds of causality.

    This seems to be closely related to human abilities to reason mathematically, especially using diagrams and other visual aids, for instance using maps to reason about routes. It is also relevant to uses of vision in designing new machinery, designing new algorithms and many other design activities.

At this stage it is not clear how many of these functions of vision will be needed in a domestic robot like Fido. It is very likely that eventually all of them will be found in robots. But for now we can attempt to identify a subset of visual abilities that are not yet achieved by robots and which will be of general use if ways can be found to implement them.


Representations of objects, especially shape

One of the hard problems in specifying requirements is to clear what sort of information perception should provide about physical objects.

Location:
Some things are relatively clear such as the requirement to be able to represent where an object is in space, though that leaves open whether that means providing

Some of the maps may actually be best thought of as interlinked networks of routes (as many transport maps are). For animals that lack vision or mostly live in underground tunnel networks, it may be that such 'route maps' form the only representation of spatial relationships, apart from the temporary relations involved in manipulating (e.g. eating) small objects. Perhaps that is also true of a young child's representation of a house, or the representations created by adults of large buildings in which they live or work, e.g. a hotel, hospital or office block, for which they have no usable global 3-D representation, only a collection of routes beteen the places actually visited in the building. For Fido that sort of partial representation may be all that is available during learning about a new building, unless the robot has access to a database of richer information about the building.

Shape as mediator between perception and action:
Jeremy's document on requirements points out the need for the PlayMate to understand relations between actions and perceived shape and relationships: if a block is against a wall understanding its shape includes understanding how it will respond to forces applied to different parts of the surface of the block in different directions. This is a special case of the sensory-motor contingencies studied in the CNRS group.

This can be contrasted with attempts at constructing ontologies for objects that assume that all spatial structure can be expressed as part-whole hierarchies. The use of such hierarchies ignores the fact that a typical natural object, such as a rock, or a tree-trunk, or a human body has no unique decomposition into a tree parts, and any such decomposition will capture only a small subset of relationships between parts, focusing mainly on permanent relationships, making it hard to express, for example, the fact that the end of a person's left little finger is in his left ear.

Can we say what Fido will see in arbitrary objects in a domestic context?
If we ask what is common to clothing, blankets, food and drink of various kinds, containers, kitchen utensils and gadgets, furniture, wall-fitings, doors, carpets, windows, curtains, animals, humans, spaces, regions, routes and other things that Fido will perceive, it may at first seem that the answer is very little. But perhaps we can find the right sort of generality by moving to the right level of abstraction. One way of thinking about that is to ask about the dimensions in which information about objects can vary, and then fit objects into appropriate categories within different dimensions. The complication is that we can probably find no useful dimensions that are completely independent of one another (orthogonal).


Implications for CoSy

The above discussion of long term requirements for domestic robots provides an indication of many of the difficulties that lie ahead.

As far as CoSy is concerned we can expect only a tiny subset of these capabilities to be provided by August 2008. Some examples can be found here.


Use 'back' button to return to matrix