School of Computer Science THE UNIVERSITY OF BIRMINGHAM CoSy project CogX project

Learning about them, exploring them, seeing them...
Last updated: 27 May 2010
Installed: 27 May 2010

This web page is a place-holder. To be improved later.

More papers on this 'misc' web site

Background to this web site

I noticed an advert for a post-doc post at MIT

   Postdoc in material perception at MIT.
    A postdoctoral position is available at MIT on the perception of
    materials and surfaces, under the supervision of Edward Adelson and
    Ruth Rosenholtz. The goal is to understand, at a computational level,
    the information in an image that allows a subject to recognize
    materials (e.g., wood, glass, fabric, etc.) and their properties
    (e.g., smooth, shiny, translucent, etc). The ideal candidate will have
    strong skills in some combination of visual psychophysics, computer
    graphics, machine vision, and machine learning. Programming in Matlab
    and C++ are required. Start date is flexible.

    Please email a CV, a cover letter explaining your research interests,
    and the names of 3 references, to Edward Adelson (adelson at csail dot
    mit dot edu).

This caught my attention because I have been thinking and writing
about the problems of learning about kinds of stuff for some time,
   Talk 68: Ontologies for baby animals and robots
     From "baby stuff" to the world of adult science: Developmental AI from a Kantian viewpoint.
   Aaron Sloman

So I searched Ted Adelson's web site, and found the following paper:

    On Seeing Stuff: The Perception of Materials by Humans and Machines
    Edward H. Adelson
    Proceedings of the. SPIE Vol. 4299, pp. 1-12,
    Human Vision and Electronic Imaging VI,
    B. E. Rogowitz; T. N. Pappas; Eds. (2001)

That prompted me to write him a message saying:

I have been trying (not very successfully) to get researchers in
vision and robotics to help me think about the kinds of ontologies
that animals and robots need in various environments if they are to
interact, as humans and other animals do, with things in those
environments, and even more so if they are to *understand* those
interactions, e.g. so as to be able to think about them in advance
of doing them, or to try retrospectively to understand why something
did or did not happen, or in order to think about what others (e.g.
babies and toddlers exploring their world) are doing, which might
hurt or harm them ("vicarious affordances").

So I was really interested to see this job description. A little
googling took me to your 2001 paper:

    On Seeing Stuff: The Perception of Materials by Humans and Machines

I am amazed at the overlap of interest, including use of the word
'stuff', though in several ways your paper goes beyond what I have
written, but I think there are important gaps (which you may have
addressed elsewhere) concerning the role of motion.

I have been arguing that a major subset of the learning that human
infants and toddlers do must be about kinds of stuff, where each
kind is largely defined by is relationships to different shapes, and
processes involving causal interactions between shapes.

E.g. some kinds of stuff resist change of shape, and break if
forced. Others resist change, but allow it to happen and restore it
if the shape-changing force is removed. Others allow change with
mild or strong resistance, but do not attempt to restore shape.
Others offer no noticeable resistance.

One of my slide presentations which seems to baffle most roboticists
and vision researchers (especially the younger ones) is about what I
call 'baby stuff':
   From "baby stuff" to the world of adult science: Developmental AI
   from a Kantian viewpoint.

I shall now modify that presentation to refer to your paper...
    (... now done...)

Your recent advert refers only to the features of materials that
might be visible in static scenes ("the information in an image"),
which is also the main theme of your 2001 paper. However, I think
there's far more that can come from perception of processes, whether
produced by the perceiver (pushing, pulling, squeezing, pinching,
prodding, stretching, bending, twisting, etc.) or merely observed,
e.g. caused by wind, by objects colliding, by actions of others,
etc.; and others that are produced by change of viewpoint, change of
light source, change of things seen through transparent objects or
reflected in them.

In fact, without being able to perceive, produce, or think about
processes, an individual will not have the means to grasp the full
semantic content of many of our labels for describing kinds of

I suspect that the ability to see material properties in static
images of the sort you show in your paper may be the result of much
learning where we first acquire concepts of different kinds of
surface and different kinds of material through exploration of
effects of motion, and when we have those concepts we fit them on to
static images using constraint satisfaction mechanisms, which,
sometimes get the wrong answer.

If that suspicion is correct, it will be difficult, or impossible,
for a visual system that is developed to deal only with static
images to gain the ability to understand as wide a variety of images
showing different kinds of stuff as we can. (I need to see if I can
replace that suspicion with an argument based on examples.)

E.g. we can define a concept of smoothness in terms of mathematical
properties of surfaces, but for humans (and presumably some other
animals and future machines) it will be more important (especially
for non-mathematicians!) to understand smoothness in terms of what
happens when one surface moves relative to another with which it is
in contact. That can include various side-effects of the relative
motion, including noise produced, or different kinds of resistance
to relative motion potentially caused by tangential forces (friction
and stiction).

Of course, different frictional properties can be produced by
equally smooth surfaces made of different materials. I return to
this below. So there's a problem of separating out different causes.

'Shiny' and 'translucent' can be defined in terms of static states
(e.g. how much light bounces off the surface and in which
directions, or how much light passes through the material and with
what kind of information loss or distortion, etc.). But both of them
also have implications regarding how appearances change as a result
of motion (e.g. moving highlights).

Some researchers I know think that the way to communicate such
concepts to machines is to present lots of labelled examples of
pictures and use some sort of learning system.

But that assumes that datamining in image features will provide all
the relevant semantics, which must be false if our concepts have
richer links with causal powers of surfaces and the material. Even
moving to labelled examples of movies will not necessarily achieve
the required results if the learning engine does not have the right
conceptual and representational apparatus to start with.

One of the hardest problems, as far as I can tell, is finding what
form of representation might be developed by a young explorer,
learning about different kinds of process and the various
interactions between structure, matter, motion and applied forces.

I suspect the appropriate representation of space and motion has not
yet been found. It may depend on kinds of mathematics that I won't

Note added: 28 May 2010

It is often assumed that information about motion of objects will
have to be expressed in terms of (or 'grounded in') sensory and
motor signals of the perceiver. This view is a modern revival of
concept empiricism, demolished by Immanuel Kant by 1781. In any case
I suggest that believers in symbol grounding theory who take that
line will find it difficult to produce working systems where all
thinking about kinds of stuff and their relationships to shapes and
motions has to be expressed in terms of sensory-motor signals.

Instead I think we need, and have, a-modal ways of representing
information about contents of the environment, along with ways of
projecting from those ways of thinking so as to predict or explain
the observed sensory-motor statistical relationships.

People born blind, or limbless or suffering some other sensory or
motor deficiency do learn many of the same concepts as the rest of
us, of kinds of things that can exist or occur in the environment.

[It is not often noticed that robots that use SLAM (Simultaneous
Localiszation and Mapping) end up with topological and metrical
relationships between walls, doors, corridors, spaces of other
kinds, obstacles etc., which are not represented in terms of the
sesory and motor signals from which the information was derived.
I.e. SLAM often leads to a-modal exosomatic forms of
representation from which it is possible to derive (or project)
what will be seen etc. if the machine moves in certain ways.]

See Also

Kristine S. Bourgeois, Alexa W. Khawar, S. Ashley Neal, and Jeffrey J. Lockman
Department of Psychology Tulane University
    Infant Manual Exploration of Objects, Surfaces, and Their Interrelations
    in INFANCY, 8(3), 233-252
    DOI: 10.1207/s15327078in0803_3

Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham