School of Computer Science THE UNIVERSITY OF BIRMINGHAM Ghost Machine

Seeing Possibilities For a Cup And Saucer

(VERY EARLY DRAFT: Still changing rapidly, so saved
copies will soon be out of date. Save links instead!)

Aaron Sloman
School of Computer Science, University of Birmingham.
(Philosopher in a Computer Science department)

Installed: 17 Jul 2014
(Using pictures taken in 2005)
Last updated: XXX
____________________________________________________________________________

This paper is
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/cup-saucer-challenge.html
A PDF version may be added later.
A closely related document, using these pictures, was written during the EU CoSy
robot project, and made available here:
http://www.cs.bham.ac.uk/research/projects/cogaff/07.html#708
     "Perception of structure: Anyone Interested?"

Two other closely related documents written around the same time are
http://www.cs.bham.ac.uk/research/projects/cogaff/07.html#709
     Perception of structure 2: Impossible Objects
http://www.cs.bham.ac.uk/research/projects/cosy/photos/crane/
     Challenge for Vision: Seeing a Toy Crane -- Crane-episodic-memory

A partial index of discussion notes is in
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/AREADME.html

____________________________________________________________________________

Some Challenges for a Visual System

Consider these two scenes:
    Saucer on cup         Cup on saucer
These two pictures were taken using the same three objects arranged differently.
____________________________________________________________________________

Question:
Using a two-finger gripper, what actions can get from the situation on the left
(or any situation with partly similar initial relationships) to the situation on
the right (or a similar situation), and back again? Think about how you could do
that, before reading on.

Discussion:
Notice that no known vision system, computer-based or human, can determine exact
directions, distances, curvatures, orientations, thicknesses, and other spatial
properites and relations from these two images, partly because they are low
resolution images taken in poor light (late at night in a hotel bedroom with one
ceiling light not working!), and partly because a single 2-D image cannot in
principle provide exact distances and sizes. So anyone who understands the above
question and thinks about a possible answer must be using interpretations of the
images that abstract from precise metrical details. Many vision researchers
assume that the abstraction has to be done by replacing precise metrical values
with probability distributions over such values, but there is another way: using
partial orderings to relate parts of the scene rather than absolute values.
Orderings can be of many kinds: further away , further apart, wider, more
curved, thicker, shallower, sloping more steeply, changing curvature more
rapidly in a certain direction, Where processes are involved, again instead of
specifying exact directions, velocities, and accelerations in some coordinate
system, for many purposes it may suffice to use partial orderings, based on
relations like: moving faster than, changing speed faster than, changing
direction faster than, rotating faster than, and many more, including
comparisons of acceleration (rates, of rates of change).

In some cases instead of processes being described in absolute or relative
spatial terms they can be described at an even higher level of abstraction,
in terms of changes in affordances that are produced by motion either of things
perceived, or of the viewer. This can include changes in proto-affordances:
changes in possibilities for motion or changes in possible interactions between
things, with no agents' actions or needs being involved.

An extended discussion of opportunities for using partial orderings instead of
probability distributions to deal with uncertainty or poor data, can be found in:
   http://www.cs.bham.ac.uk/research/projects/cogaff/07.html#718
   Predicting Affordance Changes
   (Alternative ways to deal with uncertainty)

Further questions
Earlier you were asked to think about how you might rearrange the objects in
order to get from a configuration like the first to a configuration like the
second. Are you able to describe, not the actions, but how you thought about the
actions, including the intermediate stages and linking processes that you
thought about? Did you need to consider any exact distances, widths, directions,
weights, or other geometric or physical properties or relations?

Did you consider which of your body parts you would use, how the appearance of
the scene would change, and what information you would use about the changes
when selecting and controlling actions?

Normally we can plan actions without considering those details because we know
that we have mastery of the familiar types of sub-task required and this
manipulation task is not a difficult tast (for a normal adult in our culture),
unlike some puzzles that most people find difficult, such as the fisherman's
folly puzzle, which requires separation of the metal ring from the rest of the
object, without cutting or breaking anything.

    puzzle

Image from:
   Pedro Cabalar, Paulo E. Santos, (2011)
   Formalising the Fisherman's Folly puzzle, in
   Artificial Intelligence, 175, 1, pp. 346--377, 2011,
   Issue on John McCarthy's Legacy,
   http://www.sciencedirect.com/science/article/pii/S0004370210000408

   (That paper shows (a) how the puzzle can be "translated" into a
   logical problem, which most humans can't do, and (b) how an AI
   planning program can solve it, in the translated form. They make no
   claims or promises about automating the translation of the puzzle
   into a logical form.)

Returning to the Crockery challenge

Consider how, prior to the action, the agent (one who has not discovered the
translation in Cabalar and Santos) has to, solve several sub-problems.

Could such deliberative premeditation use an action schema (or operator) with
approximate, qualitative parameters instead of the more definite actual
parameters that would be used (explicitly or implicitly) if the action were
performed?

NOTE:
There are problems here partly analogous to problems of reference and
identification in language, except that the mode of reference is not linguistic
and what is referred to typically cannot be expressed in language because it is
anchored in non-shared structures and processes.

(Internal `attention' processes are partly like external pointing processes:
virtual fingers -- in some cases because they exhibit 'causal indexicality',
i.e. implicitly referring to the results of learning, or selective attention,
achieved by some internal learning mechanism, as pointed out in:

   http://www.cs.bham.ac.uk/research/projects/cogaff/03.html#200302
   Aaron Sloman and Ron Chrisley,
   Virtual machines and consciousness,
   Journal of Consciousness Studies, 10, 4-5, 2003, pp. 113--172,
   NOTE:
   A detailed commentary (and tutorial) on this paper by Marcel
   Kvassay, comparing and contrasting our ideas with the anti-reductionism
   of David Chalmers, was posted on August 16, 2012:
   http://marcelkvassay.net/machines.php

____________________________________________________________________________

Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham