Last Updated: 19 Jan 2008
At present the PlayMate robot, illustrated in this video prepared for the CoSy review in November 2007, is very unreliable. The robot succeeds in achieving its goal less than 50% of the time, though that is not shown in the video of course. Such failures could be due to inaccuracy in the production of movements specified by motor control signals. Since the robot uses the Katana robot arm which provides very precise control, that is not the source of the PlayMate's problems. The difficulties arise mainly from the quality of the information the robot has about the current state of the environment. If the visual system were able to provide exact 3-D locations of every point of every surface of objects in the scene, including the robot's own hand, then in principle the planning and motor control subsystems could produce plans and motor signals based on those plans that enabled the robot to achieve its goals (apart from problems such as objects slipping in the gripper when lifted, which are not a serious problem in the current scenario). The poor quality of information available for planning and motor control has three main aspects:There are two very different ways to try to improve the performance.
- Inadequacy of current visual systems (which typically fail to find all the important detail in images, and fail to provide accurate 3-D information on the basis of current stereo algorithms),
- Inadequacy of the form of representation used, which does not allow important qualitative structures and qualitative relationships implied by the sensory information to be represented.
- Inadequacy of current architectures, which do not allow the system to detect and deal with problems in what it knows.
Servoing vs 'Sense-decide-act' cycle
It is often assumed by AI researchers that intelligent systems have to make use of a repeated three stage sequence of processesHowever, many control systems in which control signals and sensing are performed continuously are incompatible with this model: instead they assume that the components of the alleged sequence are actually performed in parallel. It is obvious that humans and many animals do not fit the sense-decide-act model and work on the Birmingham CogAff project has since around 1991 been assuming that an architecture is needed in which at least 9 different types of process are performed concurrently -- though without making the control-engineer's assumption that those processes are all continuous and of a type for which differential equations form a good representation. Some may be continuous and some not, including possibly 'alarm' mechanisms that monitor mechanisms of other sorts and have the ability to freeze, modulate, redirect, or abort processes of all sorts.
- acquire information about the environment via sensors (including checking predictions of effects of actions last performed);
- process information and decide what to do next and make predictions about the consequences of doing it;
- perform the selected action.
The CogAff Architecture Schema allows for interactions between many different sorts of concurrently active processes, some continuous some discrete, including fast-acting 'alarm' mechanisms triggered by trainable pattern recognition processes. In particular the notion of servo control, which normally assumes continuous (analogue) information processing can be generalised to include visual servoing which includes discrete processes of high level perception, goal-generation, goal processing, planning, decision making, self-monitoring, learning, and initiating new actions along with continuous control of movements and sensing of actions and environmental changes. This paper assumes that such an architecture is available, and outlines a hypothesis that some of the information processing that could be useful for a robot (or animal) manipulating objects in the environment uses visual servoing and other kinds of servoing, partly on the basis of predicting changes in two kinds of affordances.
Three linked sub-hypotheses
The hypothesis can be subdivided into the following:
1. The robot's reliability in performing manipulative tasks can be
increased substantially by giving it the following new "cognitive
servoing" competences (and probably others of the same general kind,
still to be specified):
o The ability to detect what it is and is not sure about
-- whether it is sure about properties and relations perceived
in the scene
-- whether it is sure about predictions it makes about effects
of its actions.
o The ability to detect that performing certain actions will
provide missing information, e.g. moving a block to one side
will allow the full width of another block to be seen, or moving
the camera to one side will allow the full width to be seen.
o The ability to move out of regions of uncertainty when it is on
a "phase boundary" between being sure that something is true and
being sure that it is false, for example, boundaries between:
- being sure that it can estimate the size of something, sure
that it cannot;
- being sure that its hand is currently moving in the right
direction to achieve a sub-goal, or sure that it is moving in
the wrong direction;
- being sure that an object is narrow enough in a certain
dimension for it to be graspable by the robot, or sure that it
is not narrow enough.
o The ability to use 2-D projections of scene structures to reason
qualitatively about which way to move in order to move away from a
phase boundary into a region of certainty -- i.e. the ability to
"reason with imagined diagrams" in order to solve problems
related to planning and controlling actions. (Examples are given
below.)
o The ability to use all of the above as part of the process of
"visual servoing" so as to detect and correct slight
mis-alignments or mis-locations of the hand while moving in
order to perform some task.
2. This requires the robot to be able not only to predict physical
and geometrical changes that will result from its actions but also
to predict and reason about something more abstract:
changes in affordances.
In particular we distinguish the ability to predict
Many people studying affordances have noticed that they are related
to actions (or more generally to processes) that can produce some
physical change in the environment. What is not so often discussed
is that there are many changes in the environment that change the
affordances in the environment. It is also not always noticed that
whereas the main focus of investigations of affordances has been on
what physical actions can be performed there are also important
issues concerned with what might be called "epistemic affordances"
or "cognitive affordances" i.e. affordances for an animal or robot
concerned with information that is or is not available to that
individual.
Some people have discussed this, e.g.
http://research.cs.vt.edu/usability/projects/uaf%20and%20tools/affordance.htm
Physical and cognitive affordances help users perform physical
and cognitive actions, respectively. We agree with Norman that these
two kinds of affordance are not the same. They are essentially
orthogonal concepts, but we think they both play very important
roles. The reason for our giving them new names is to provide a
better match to the kinds of actions they help users make during
their cycle of interaction. A physical affordance is a design
feature that helps, aids, supports, or facilitates physically doing
something, and a cognitive affordance is a design feature that
helps, aids, supports, or facilitates thinking and/or knowing about
something.
Author not specified
HCI and Usability People at Virginia Tech
So, physical actions or processes can change not only the available
action affordances, they can also change epistemic affordances --
e.g. what can be perceived, felt, heard, etc. allowing the
individual to obtain new information or, in the case of negative
affordances, obstructing access to information.
So both action affordances and epistemic affordances can be changed
when something moves in the environment and that means that the
possibilities for those movements are related to possibilities for
adding, removing or modifying action and epistemic affordances.
We can refer to the affordances to produce or modify affordances as
"meta-affordances". This paper introduces examples and discusses
ways in which meta-affordances can be used in predicting how actions
or other events will change affordances.
A particularly important class of actions that can affect epistemic
affordances is the set of changes of view point or view direction,
but there are many others, including moving an object to make
something more visible. Besides epistemic affordances related to
vision, there are others related to other sensory modalities, but
not much will be said about that here.
I believe that this discussion is closely connected to other CoSy
discussion papers concerned with the need for exosomatic, amodal
ontologies and limitations of the use of sensorimotor contingencies
as a means of representation, but that will have to be discussed in
another paper.
For more on that topic see
http://www.cs.bham.ac.uk/research/projects/cosy/papers/#dp0601
COSY-DP-0601 (HTML file): Orthogonal Recombinable Competences
Acquired by Altricial Species
http://www.cs.bham.ac.uk/research/projects/cosy/papers/#dp0603
COSY-DP-0603 (HTML): Sensorimotor vs objective contingencies
This is a first draft discussion of some of the ways in which the
PlayMate scenario might be extended to include acquisition and use
of meta-affordances, concerned especially with predicting affordance
changes.
There are some proposals for using these ideas for dealing with
uncertainty by identifying "phase boundaries" between regions of
certainty regarding affordances, and keeping away from those phase
boundaries to avoid uncertainty.
Consider holding a pen in the vicinity of a mug resting on a table
with nothing else nearby on the table. Depending on where the pen
is, what its orientation is, and how you are holding it, there will
be different possibilities for motion of the pen, with different
consequences. There will also be different possibilities for
obtaining information about some or all of the pen, or the mug, or
about the relationship between them.
For example, if you are holding the pen horizontally above the mug,
centred on the mug's vertical axis then, if you try moving the pen
down,
the motion will be limited by the rim of the mug. However there
are several actions that will make it possible to move the pen to a
lower level including these:
o moving the pen horizontally in various directions until no
part of the pen is above the mug, after which it will be
possible to move the pen down to the table.
o rotating the pen until its axis is vertical, after which it
will be possible to move the pen vertically downwards until
it hits the bottom of the mug.
Those are examples where an action (horizontal movement, or rotation
about a horizontal axis) produces a new state in which changed
affordances allow additional actions (downward vertical movement).
Other movements will restrict the actions possible. E.g. if the pen
is pushed horizontally through the handle of a mug and the mug is
fixed, that will restrict possibilities for movement of the pen in
any direction perpendicular to its long axis.
There are also changes that will alter the information-gaining
affordances. For example, if the pen is oriented vertically and only
the portion projecting above the mug is visible there are many
questions you will not be able to answer on the basis of what you
can see, e.g.
o How long is the pen?
o Is the invisible part of the pen inside the mug?
o If the pen is moved horizontally to left or right, or moved
horizontally further away from you, will its movement be
obstructed by part of the mug?
Such unavailable information can often made available either by
moving something in the scene or by changing the viewpoint.
For example, lifting the pen vertically can change the situation so
that the first question can be answered. The second and third
questions could be answered either by moving to look down from a
position above the mug or by moving the viewing position sideways
horizontally and viewing the mug and pen from some other positions.
The problem discussed in this paper is: what are the ways in which
by performing an action an agent can change not just the physical
configurations that exist in the environment, but also the
affordances that are available to the agent, including both action
affordances and epistemic affordances (i.e. affordances for
gaining information).
Humans (though probably not infants or very young children), and also, I suspect, some other animals, are able to perceive scene structure in such a way as to support reasoning about how to change things so as to alter affordances. This competence includes
The pictures below illustrate some of the constraint changes that can be predicted. If all this is correct, then one of the previously unnoticed (?) functions of a vision system is to be able, when seeing a movement of an object the vicinity of another object to predict that IF that movement continues THEN the relationships between the two objects will (or will not) change in specific ways so as to restrict or allow further movements (seeing changing action affordances), or so as to restrict or allow further information acquisition (seeing changing epistemic affordances). Similar reasoning should be applicable to reasoning about consequences of possible motions as opposed to actual motions. This is relevant to both the CoSy PlayMate scenario and the CoSy Explorer scenario. [See also the KR'1996 paper "Actual possibilities"]
Current AI systems, if they can do such things at all, will probably either use some sort of logical formalism to represent states of affairs and actions, and will perform the tasks by manipulating those representations, e.g. as a planner or theorem prover does, or use some probabilistic mechanism such as forward propagation in a neural net or some sort of Markov model. Either way, states will be represented by a logical or algebraic structure, such as a predicate applied to a set of arguments, or a vector of values, and predictions will involve constructing or modifying such structures. The abilities described and illustrated below seem to involve the use of a different sort of mechanism: one that makes use of 'analogical' representations in the sense defined in (Sloman 1971), discussed as an example of the use of an internal GL (Generalised language) in this presentation on evolution and development of language. This ability to reason about how affordances change as a consequence of changing locations, orientations, and relationships of objects also provides illustrations of the notion of Kantian causal competence, contrasted with Human causal competence in presentations by Chappell and Sloman here. The important point about such reasoning, apart from the fact that it is visual reasoning that uses analogical representations, is that the reasoning is geometric, topological and deterministic, in contrast with mechanisms that are logical or algebraic and probabilistic.
Detecting whether motion restrictions are present or not, or whether
a continued motion will produce new restrictions or remove old ones,
can be done with considerable confidence in VERY MANY cases even
when images/videos are noisy and when accurate metrical information
cannot be extracted from them.
That is because the nature of such restrictions, e.g.
A prevents the motion of B from continuing
does not depend on precise metrical relationships between
objects their surfaces and their trajectories. Instead, much
coarser-grained relationships, using relatively abstract spatial
information, especially topological information and
ordering information (e.g. A is between B and C), suffices
for most configurations.
For example, if the point of a pen is within the convex hull of an
upward facing mug then the material of the mug will eventually
constrain horizontal and downward motion if the pen moves, but not
upwards motion.
The word 'eventually' is used in order to contrast predicting
exactly how much the object can be moved before contact
occurs with predicting that contact will occur e.g. before
the pen point has reached a target location outside the mug. I.e.
the prediction is that a boolean change will occur (some
relationship between objects will change from holding to not
holding), but not exactly where or when it will change. That
prediction does not involve high precision, but is sufficient to
indicate the need to lift the pen before moving it horizontally
far beyond the width of the mug.
If the mug is lying on its side, and the pen is horizontal with the
point in the mug, then the mug constrains vertical movements and
some, but not all, horizontal movements. For example, a horizontal
movement bringing the pen out of the mug is not constrained, whereas
a horizontal movement in the opposite direction into the mug will
eventually be constrained -- when the pen hits the bottom of the
mug. (The bottom surface is vertical because the mug is lying on its
side.)
A robot that understands its environment needs to be able to
perceive such constraints and use both in planning future actions
and in controlling current actions: e.g. ensuring that the movement
will bring about a desired change in constraints by adjusting the
direction of motion or the orientation of one of the objects.
In very many cases there is no need for very precise control (e.g. below a few cm., or within a few degrees). The actual precision required depends on the task: predicting whether a ball thrown towards a bin at the far end of the room will go into the bin requires far more precision than predicting whether letting go of the ball when it is held close to a mug will cause it to enter the mug.
Predicting some of the changing affordances that will result from continuation of a perceived movement is very often quite easy because they depend only on topological relationships or very crude metrical relationships. The exceptions occur when objects are close to 'phase transitions' e.g. close to the boundary of a convex hull of a complex object, or close to a plane through a surface or edge. In those special cases it is often hard to make binary classifications that are easy in the vast majority of cases. But it is usually easy to make a small movement that will turn a hard problem into an easy one.
This is now illustrated with some examples. The diagram represents
various possible configurations involving a pencil and a mug on the
side, along with possible translations or rotations of the pencil
indicated by arrows.
Figure 1
Questions relating to Figure 1
Assume that all the pencils shown in the figure lie in the vertical plane through the axis of the mug. So they are all at the same distance from the viewer, as is the axis of the mug. For each starting point and possible translation or rotation of the pencil we can ask questions like: will it enter the mug?, will it hit the side of the mug?, will it touch the rim of the mug? In some cases the answer is clear. In cases where the answer is uncertain, because the configuration is in the "phase boundary" between two classes of configurations that would have clear answers we can ask how the pencil could be moved or rotated to make the answer clear. (Compare being unsure whether you are going to bump into something while walking: you can either try to look more carefully, use accurate measuring devices, etc. compute probabilities, etc. or you can alter your heading to make sure that you miss the object.)
TO BE EXTENDED
The ability to answer such questions is required for PlayMate's ability to plan movements. The same comment applies to questions below.
As illustrated above, when predictions need to be made, an intelligent agent can move the object away from the 'difficult' position or trajectory so that it is far enough from the phase transition for fine control or precise predictions not to be required. In some cases where being close to a phase transition makes a perceptual judgement difficult (e.g. will an object's motion lead to a collision?) it is possible to resolve the ambiguity by a change of viewpoint. Moving to one side, for example, may alter one's view of a gap so that it becomes clear whether the gap is big enough for an object to fit in it with space to spare. Some simple examples of problems requiring a change of viewpoint are given below. Similar comments apply to relations not between objects but between their trajectories. The exceptions are hard to deal with, but very many cases are easy, without requiring great precision, because they concern topological or ordering relations rather than metrical information, and a change of viewpoint or slight modification of a trajectory may turn a difficult prediction into an easy one. Another type of exception is related to the fact that in the 'easy' cases discussed above movements can be visualised in advance with accuracy sufficient for the task of deciding what will happen, and they can also be performed ballistically, without fine-grained feedback control. A different sort of situation occurs when the object being acted on is very small (e.g. it takes up a relatively small portion of the visual field, and relatively small changes in motor signals will always make a difference to whether a finger does or does not make contact with the object). Using a small tool e.g. small tweezers to manipulate such objects requires additional competences beyond those discussed above. But for now we can ignore such actions: they require expertise that probably develops later involving fine-grained visual servoing to control very precise small movements. Such cases are ignored here.
Much work in the Birmingham Cognition and Affect project has been concerned with the role of a 'meta-management' layer in an agent architecture, namely a layer of mechanisms providing various kinds of self-monitoring and self-control of internal states and processes. There are several presentations on varieties of architectures, explaining such ideas, here. A relatively simple tutorial is included in this presentation on robotics and philosophy. See also the remarks about fully deliberative architectures here. It is worth mentioning that meta-management capabilities are required for dealing with the problems of uncertainty mentioned above. The individual trying to predict how affordances will be changed if an action is performed, needs to be able to detect when that prediction is hard because the objects and trajectories are close to a 'phase boundary' so that only if precise, noise-free information is available can the prediction be made reliably. If such situations are detected, using a meta-management mechanism to evaluate the quality of current information, then working out how to change the situation so that the problem is removed, e.g. by moving an object or rotating it so as to move it further from the phase boundary can use a deliberative mechanism if the situation is unfamiliar, or a learnt reactive behaviour, if the situation is familiar.
The pictures below are somewhat idealised 'hypothetical' snapshots
of situations in which motion can occur. Questions are asked about
the pictures to illustrate some of the requirements for visual
understanding of perceived structures. The examples add a
requirement that was not included in the previous examples, namely a
requirement to understand implications of things being at different
distances from the viewer. However the scenes involve 2.5D
configurations, i.e. the depth relations are merely orderings,
without any metric.
Figure 2
What should a vision program be able to say about the above images
(A), (B), (C), (D), each involving a mug, a horizontal pen, and two
rigid vertical cards, if asked the following questions in each case:
Figure 3
What should a vision program be able to say about the above images
(A), (B), (C), (D), each involving a mug, a pen, and two rigid
vertical cards, if asked the following questions in each case:
Figure 4
What should a vision program be able to say about the scene depicted
in Figure 4?
Are there any actions a robot could take to shed light on what's
going on?
Compare:
http://www.cs.bham.ac.uk/research/projects/cosy/photos/penrose-3d/
Pictures based on the work of Oscar Reutersvärd (1934)
When a robot or animal is controlling its own motions, there are many examples of prediction of consequences of movement that are related to but different from the examples given. E.g. as the eye or camera moves forward the location of some object within the visual field indicates whether continued motion in a straight line will cause the eye to come into contact with the object or move past it. Slightly more complex reasoning is required to tell whether a mouth or beak that is rigidly related to the eyes will be able to bite the object. That situation is analogous to the camera mounted on the PlayMate's arm, near its wrist, as shown here:For example consider the problem of using camera images to control the motion of the hand with a wrist-mounted camera, when an object is to be grasped, or using eyes mounted above a mouth, when an object is to be grasped with the mouth. Here are two schematic (idealised) images representing a pair of snapshots that might be taken from a camera mounted vertically above the wrist and pointing along the long axis of the gripper.
One of the images is taken when the gripper is still some way from the block to be grasped and the other is taken when the gripper is lower down, closer to the block. It should be clear which is which. Now, if the camera is mounted above the gripper is the gripper moving in the right direction? For the robot to use the epistemic affordance here it has to be able to reason about the effects of its movements on what it sees and how the effects depend on whether it is moving as intended or not. It is possible that instead of explicit reasoning (of the sort you have probably had to do to answer the question) the robot could simply be trained to predict camera views and to constantly adjust its movements on the basis of failed predictions. In one case it needs explicit self knowledge, which can be used in a wide variety of circumstances, and in the other case it needs implicit self knowledge, produced by training, which is applicable only to situations that are closely related to the training situations. A human making use of the epistemic affordance by reasoning about the information available from the differences between the two views, may make use of logic, a verbal language, and perhaps some mathematics. A less intelligent animal or robot may have that information pre-compiled (e.g. into neural control networks) by evolution or previous training and available for use only in very specific control tasks. Is there some intermediate form in which the information could be represented and manipulated that could be used by an intelligent animal to deal with novel situations, and which does not depend on knowing logic or a human-like language, but might make use of what we have been calling a GL (a Generalised Language), which has structural variability and compositional semantics and may involve manipulation of representations of spatial structures? See: http://www.cs.bham.ac.uk/research/projects/cosy/papers/#tr0703 Computational Cognitive Epigenetics (Sloman and Chappell, to appear in BBS 2007) In all cases visual servoing requires what could be described as 'self-knowledge' insofar as it involves explicit or implicit knowledge about the agent's situation and actions that can be used to make predictions and to interpret discrepancies between predicted and experienced percepts, and to use those discrepancies to alter what it is doing. But this does not require an explicit sense of self if that implies that the robot (or animal or child learning how to bite things or grasp things) is able to formulate propositions about its location, its actions, its percepts, its goals, etc.
Video input from a real camera will be far more complex, noisy and
cluttered than the idealised line drawings depicted above. As a
result it will be difficult to locate the edges, corners, axes,
centroids, etc. of image components accurately, or to compute
distances or angles between them accurately.
One way of dealing with that is to attempt to estimate the
uncertainty, or the probability distributions of particular
measures, and then to develop techniques for propagating such
information in order to answer questions about what is going on in
the scene, where the answers will not use precise measures but
probability distributions.
Another way is to find useful higher level, more abstract
descriptions, whose correctness transcends the uncertainty regarding
the noisy image features. So for example, the change between the
left and right images above could be described something like this
(though not necessarily in English):
In the second picture, the image of the target object is larger
and higher in the field of view.
The uncertainty and noise in the image can be ignored at that level
because all the uncertainty in values in the images is subsumed by
the above the description. The description does not say what the
exact sizes of the of the images are in the two pictures, or the
exact locations, or the exact amount by which it is larger or
further from the bottom edge.
So since the gripper is below the camera, the fact that the image is
moving up the field of view means that the direction of motion of
the gripper is towards a point below the target, requiring the
motion to be corrected by moving the wrist up. Exactly how much it
move up need not be specified if the motion is slow enough and
carefully controlled to ensure that the target object moves towards
a location that has previously been learned is where it should be
for the gripper to engage with it. If the gripper fingers are moved
far enough apart the location need not be precise, and if there are
sensors on the inner surface of the fingers they can provide
information about when the object is between the fingers and the
grip can be closed.
This description is over-simplified, but will suffice to illustrate
the point that there is a tradeoff between precision of description
and uncertainty and that sometimes the more abstract, less precise,
description is sufficiently certain to provide an adequate basis for
deciding what to do.
Birds that build nests out of twigs, leaves and similar materials need to be able in some sense to understand and use changing affordances as they move twigs and other objects around during the construction process. Future domestic robots will also need to have such competences. The abilities to predict changing affordances form a special case of understanding causal relationships, in particular Kantian causal relationships, as discussed in http://www.cs.bham.ac.uk/research/projects/cogaff/talks/wonac
When humans solve the prediction problems described above we seem to
be making use of manipulable models of 2-D structures, containing
parts that can be moved and rearranged, along with the ability to
detect new contact points arising.
Compare Sloman 1971 on the Fregean Analogical distinction:
http://www.cs.bham.ac.uk/research/cogaff/04.html#200407
Brian V. Funt, 1977
WHISPER: A Problem-Solving System Utilizing Diagrams and a Parallel Processing Retina
IJCAI 1977, pp 459-464
http://dli.iiit.ac.in/ijcai/IJCAI-77-VOL1/PDF/077.pdf
Usefully summarised in
Zenon Kulpa
Diagrammatic Representation And Reasoning
Machine GRAPHICS & VISION, Vol. 3, Nos. 1/2, 1994, 77-103
http://www.ippt.gov.pl/~zkulpa/diagrams/Diagres.pap.pdf
See also:
Kulpa's Diagrammatics web page
I think a relatively simple computer implementation could be built
and used as part of a visual reasoner in CoSy, using techniques used
in graphical software for making and editing diagrams, e.g. TGIF,
XFIG, etc.
(Tgif saves all of its diagrams in a logical format, using Prolog.
http://bourbon.usc.edu/tgif/
It can generate 2-D displays from the Prolog specification, and
mouse and keyboard interactions with the display can lead to a new
Prolog specification of the display.)
The hard part will be parsing real visual images to produce the
required 2-D manipulable representations.
Slightly easier will be software to:
1. Manipulate the parsed 2-D images, e.g. by sliding one structure
in a specified direction while leaving other structures
unchanged, or rotating a structure around a specified point
while preserving its shape.
2. Detect consequences of continuous movements of one or more parts
of the diagram, e.g. detecting when a moving circle first comes
into contact with a fixed triangle, or detecting when the bottom
portion of a partially occluded rectangle behind a circle
becomes visible as the rectangle is moved horizontally.
For affordance prediction and the avoidance of phase boundaries it
may be useful to be able to grow a "penumbra" of specified thickness
around the 2-D image projection of any specified object, and then
when an object A moves in the vicinity of object B, detect
(a) when A's penumbra first makes contact with B's penumbra and
where it happens
(b) Detect when one of the penumbras first makes contact with the
other object (inside its penumbra)
(c) Detect when A itself first makes contact with another object
(inside its penumbra)
Choosing penumbra sizes to facilitate reduction of uncertainty will
require programs that can analyse aspects of the structure of a
scene and detect whether some relationship introduces uncertainty in
predictions. Then choosing a penumbra size to use when selecting a
movement that is certain not to produce a collision will be a task
dependent problem.
[All this is closely related to Brian Funt's PhD. See reference below.]
NOTE: I suspect that a detailed analysis of the suggestions here
could involve developing some interesting new mathematics.
Arnold Trehub's retinoid mechanism may be useful:
The Cognitive Brain (MIT press, 1991)
http://www.people.umass.edu/trehub/
As mentioned above this work on predicting affordance changes is
related to my recent work with Jackie Chappell on GLs (Generalised
Languages) evolved for 'internal' use in precursors of humans as
well as many other mammals, e.g. chimpanzees and possibly hunting
mammals, and in some bird species. GLs are also required by
pre-verbal children. See
http://www.cs.bham.ac.uk/research/projects/cogaff/talks/#glang
What evolved first: Languages for communicating, or languages
for thinking (Generalised Languages: GLs)
If a robot can perform actions in order to change affordances,
whether action affordances or epistemic affordances, then this
provides a natural topic for situated dialogue.
Examples:
Why are you hesitating?
To check whether my hand will bump into the cube
Why did you move your head left?
To get a better view of the size of the gap between the cube and
the block
Can your hand fit through the gap between the two blocks?
I am not sure, but I'll try
Can your hand fit through the gap between the two blocks?
I am not sure, but I can move them apart to make sure it can.
Is the block within your reach
Yes because I just placed a cube next to it.
How can you get the cube past the block?
Move it further to the right to make sure it will not bump into
the block then push it forward.
etc. etc.
There is a wide variety of propositions, questions, goals, plans,
and actions, dealing with a collection of spatial, causal and
epistemic relationships that can change. If we choose a principled,
but non trivial subset related to what the robot can perceive, plan,
reason about, and achieve in its actions, then that defines a set of
questions, commands, assertions, explanations, that can occur in a
dialogue.
At a later date we could move back to an earlier stage and instead
of building all the above competence in, enable the robot to learn
some of it.
That will require working out a suitable initial state, including
initial forms of representation, competences, and architecture that
is able to support the development of a suitable altricial
competence.
See
http://www.cs.bham.ac.uk/research/projects/cosy/papers/#tr0609
COSY-TR-0609 (PDF):
Natural and artificial meta-configured altricial
information-processing systems
Jackie Chappell and Aaron Sloman
Invited contribution to a special issue of The International
Journal of Unconventional Computing
Vol 2, Issue 3, 2007, pp. 211--239,
The movies at the end of this file show a mug, a pen and a hand
holding the pen and moving it in various orientations relative to
the mug, so as to change the affordances: e.g. some positions
restrict some vertical motions and some positions restrict some
horizontal motions, e.g. left-right horizontal movements, or
front-back horizontal movements, or rotational movements.
You can easily do experiments yourself, holding a pen near, above,
inside, a mug and moving it in various ways (including translations
and rotations). Consider what predictions you can make about how
further movements will or will not be constrained if you continue a
particular movement. E.g. will the pen make contact with a part of
the mug that will constrain further movement? Will continued motion
bring the end of the pen into the mug so that further movement
sideways and down is constrained by the mug. If motion of the pen is
already constrained, what movements would alter the relationships so
as to remove the constraint? Consider also what predictions you can
make about what information will and will not be available to you.
A task for a vision system is to be able see a movement and to
predict that IF that movement continues THEN the relationships
between the pen and the mug will (or will not) change in specific
ways so as to restrict further movements.
The .avi files below are very short simple movies taken with a
webcam using the spcaview utility, available from here:
http://mxhaard.free.fr/download.html
In order to display them you also need spcaview, which can be run as
follows:
spcaview -i filename
NOTE:
In order to compile spcaview, you need SDL and SDL-devel
libraries available, if needed, from:
http://www.libsdl.org/
That package used to include the SDL_image and SDL_image-devel
libraries, but no longer does so for some reason.
They are available from here:
http://www.libsdl.org/projects/SDL_image/
These videos were taken in 2005 and may no longer be playable
Zip file All the above can be downloaded in one zip file all-videos.zip (about 20MB) (Offers of help converting to a better format welcome.)
http://www.cs.bham.ac.uk/research/projects/cosy/photos/crane
Challenge for Vision: Seeing a Toy Crane
Crane-episodic-memory
http://www.cs.bham.ac.uk/research/projects/cogaff/96-99.html#15
"Actual Possibilities", in
Principles of Knowledge Representation and Reasoning:
Proceedings of the Fifth International Conference (KR '96),
Morgan Kaufmann Publishers, 1996, pp 627-638,