School of Computer Science THE UNIVERSITY OF BIRMINGHAM CoSy project CogX project


(Learning to see a set of moving lines as a rotating cube.)

Aaron Sloman

Last updated: 20 Jan 2017
23 Jul 2008;16 May 2010;25 Sep 2010; 19 Sep 2011
This file is

There is also a pdf version (which may sometimes lag behind the html version):

I may try later to improve the format of this file.

An email list for discussion of this and related topics is available here:
(Note added four years later: the list produced no significant responses to the challenge.)


This started life as a message circulated to colleagues in May 2006 about
preconditions for a certain demanding kind of learning -- which does not seem to be
easily expressed in terms of curve fitting, matching humans on some classification
test, or any other simple criterion for success.

The question relates to the possibility of a learning system that starts with 2-D
visual capabilities and an ontology that includes 2-D points, lines, junctions, and
motion in a 2-D plane, and then invents for itself the idea of a 3-D space with
structures and motions of various kinds which explains the complex sensed 2-D
phenomena as a projection from a changing 3-D world.

Compare Plato's metaphor of people imprisoned in a cave able to see only shadows on
the cave wall: what learning mechanism could cause them to come up with the idea of
the moving 3-D objects generating those shadows, if they had not had concepts of 3-D
structures and processes?

Many learning systems do dimensionality reduction.
This problem requires dimensionality expansion.

The Necker Cube

Necker Cube

You should be able to see this as either a 2-D collection of lines or as a 3-D
wire-frame cube. In the latter case, if you look at it long enough it should 'flip'
between two 3-D structures which differ only in the 3-D locations and orientations of
the parts, not in their 2-D projections. So the difference does not involve any
change in the visual input signal. Some examples of moving 2-D cubes are given below.
The first is a rotating wire-frame cube. Because of the ambiguity displayed above,
the rotating cube is ambiguous between rotating left and rotating right: it can be
seen both ways, without any change in the 2-D visual signal.

Note: the existence of pictures whose interpretation can flip between alternatives
without anything changing in the visual input is one of many reasons for taking
qualia seriously, as discussed in

Online rotating cubes and other 3-D structures.

Message posted on May 19th 2006

From Aaron Sloman Fri May 19 12:44:32 BST 2006
To: nature-nurture
Subject: a theoretical question for theorists of various kinds

[Sorry I could not find a way to compress this question.]

As I am sure you are all aware there are debates about which
concepts, knowledge, formalisms, algorithms, architectural
features are innate and which products of learning and development.

In part that's an empirical question which may have different answers
for different animal species -- for instance newly hatched chicks and
new born deer can do things almost immediately that other species seem
to have to learn to do. Some of you have heard Jackie Chappell talk
about that with examples.

There are also theoretical/mathematical questions about what sorts of
learning can and cannot be done by various kinds of mechanisms.

Is there a general purpose learning mechanism that can reliably and
quickly discover the need for a 3-D (e.g. euclidean) ontology of lines,
surfaces, opaque bodies, etc, simply on the basis of 2-D visual data and
motor control signals?

There are some people in AI, philosophy, and psychology who believe that
that there exist very general learning mechanisms that make it possible
for something like a human being, or a future human-like robot, to
function with concepts *all* of which have been derived from experience
(e.g. by data-mining in observed sensorimotor contingencies, or by
training neural nets, or some such thing).

Such people often talk about 'symbol grounding', claiming that all
meanings have to be 'grounded' in sensory experiences, or some such
thing. (I think the phrase was invented by Stevan Harnad, about 15 years

So that raises the question: assuming that all sensory inputs and all
sensory outputs form collections of one or two-dimensional arrays of
signals, is there some way in which a general purpose learning mechanism
with NO built in knowledge of structures and processes in 3-D euclidean
space (e.g. something based on trying to minimise algorithmic complexity
of prediction mechanisms, or something else), can 'discover' the need
for an ontology which includes changing 3-D structures, including rigid
and flexible rods and strings, and surfaces in various orientations some
of which are curved some not, some of which are wholly and partly
invisible some of the time, but can become visible through movements in
3-D, either of the objects containing those surfaces, or other
intervening surfaces or the viewer?

Can this richer ontology be discovered simply by running some general
purpose learning mechanism?

In some sense the answer is obviously Yes: evolution produced animals
such as us with a grasp of concepts of 3-D structures and processes,
starting from a world that did not contain entities with such concepts.

And evolution is a general purpose learning mechanism.

But evolution takes evolutionary time-scales, and in a different
environment this might not happen.

Is there a learning mechanism that can do it reliably, repeatedly (e.g.
for all instances of a species) in a fairly short time, e.g. at most a
few months or years of learning?

Here is an example:

Suppose you have a machine (Robby the robot) that gets 2-D visual input
from a translucent screen. Behind the screen is a 3-D wire-frame cube
and a distant light projecting shadows of the wire edges onto the

    light      cube    screen  robot

                          |      o
    *           []        |     [+]

If normal adult humans see the shadows we typically see not only the 2-D
configuration of dark and light points, and lines made of such points,
but also a 3-D structure. Actually we may find that it is ambiguous
(i.e. the Necker cube) and we switch between seeing two different 3-D
structures while the 2-D structure remains unchanged.

Suppose the cube rotates about a 3-D axis and there is an array of
knobs, levers or other devices whose state allows 1-D continuous change
(e.g. rotation of a knob), whose effect is to alter the position, or
orientation of the axis, and the speed of rotation of the cube, or
perhaps just the orientation relative to the axis. (I think 4 knobs
suffice to determine the exact 3-D location of the axis, additional
knobs (4 more?) can determine the position of the cube relative
to the axis and a the orientation of the cube around the axis.)

Now suppose Robby can turn the knobs, or move the levers that determine
the 3-D position and orientation of the wire cube -- though Robby
initially does not have any conception of 3-D position or orientation.
However Robby can see the changing 2-D patterns produced by those

Now suppose Robby has to be able to predict the consequences of any
combination of knob or lever changes. Initially these consequences will
be sensed only in terms of points on the screen changing from black to
white or vice versa. But suppose Robby has 2-d line-finding algorithms,
and can also therefore identify lines and observe how they move, which
junctions between lines form or disappear as the lines move, when the
lines change direction of movement, or perhaps change their size (if the
projection of the shadows is not fully orthogonal).

Your task is to produce a general-purpose learning algorithm that can be
given merely the 2-D visual data, the ability to change the state of the
controls, and the task of continually predicting what will happen next.

On this basis it should be able to discover the power of a 3-D
representation which can be projected onto a 2-D surface, where in the
3-D representation the metrical relations of the edges relative to each
other do not change, even though in 2-D the relations between lines
(e.g. angles, relative sizes, intersection points) are constantly

This is a case of reducing dimensionality (of one kind -- variation in
data) by increasing dimensionality of another kind (in the explanatory
hypotheses). But there is structure as well as dimensionality.

It does this without being given at the start anything that includes a
specification of 3-D space. E.g. it can't have the general notion of N
dimensional euclidean space with hyperplanes, hypercubes, etc. and
then an algorithm that searches for the smallest N that will capture all
the data. It must merely try to learn to predict changes describable in
a formalism that it has to start with (2-D geometry and topology)
and end up discovering a richer formalism (3-D geometry) as the solution
to the learning problem.

Can that be done?

Has anyone done it?

I have a web-page criticising people who talk about learning
sensorimotor contingencies without addressing this issue.

ADDED 25 Sep 2010: A more recent discussion of ontology expansion vs
dimensionality reduction:

This is also related to the discussion of learning orthogonal
recombinable competences here:

Also the distinction between a Humean (correlational, probabilistic)
notion of causation and a Kantian (structure-based, deterministic)
notion of causation, here
    COSY-PR-0506: Two views of child as scientist: Humean and Kantian



Notes added in response to some comments from David Dowe 6 Sep 2006

He drew my attention to the Bayesian MML (Minimum Message Length)
approach described in

> Dowe, Gardner and Oppy (2006+) Brit J Philos Sci (BJPS) paper,
>    "Bayes Not Bust! Why Simplicity is no problem for Bayesians" at
> under "current version".

In one of my discussions of the rotating cube problem (I forget where:
it was probably correspondence with a colleague) I  pointed out that if
the learning system had a meta-syntactic competence that allowed it to
generate (enumerate) every kind of grammar expressible in one of the
standard formalisms, and for each such grammar to enumerate every
possible theory expressible in that grammar then by systematically
exploring ever more complex theories (which would be enumerable),
including theories with increasing numbers of undefined symbols in
addition to the symbols required to describe the directly sensed 2-D
structures and processes specified in the experiment) it could look for
those theories that allowed the observed behaviour of the moving shadows
to be explained and predicted, and would eventually come to the shortest
one, which may or may not be one that we would interpret as describing a
rotating 3-D structure.

(But it would also eventually come to one like that, even if it is not
the simplest theory meeting the requirements.)

However I have not yet tried to formulate a theory in a logical format
of 3-D spatial structure that would accommodate the rotating cube and
its projection to a 2-D plane. I know it can be done -- and perhaps
David Hilbert's axiomatisation of Euclidean geometry is rich enough?

I assume a bayesian version would replace the systematic search by a
randomised one using some biases, but the principle is the same?

I suspect, but can't prove, that such processes involve such vast search
spaces that they would take evolutionary time scales rather than
individual learning time-scales, unless there was some bias in the
ordering of grammars and theories, so as to favour the early production
of what we would recognise as a theory whose ontology includes 3-D
spatial structures and processes.

That, of course, would amount to an innate preference for theories that
refer to 3-D structures. (Perhaps millions of years of evolution have
produced such innate preferences in human and some other animal brains.
Can we model this)

I.e. it would not be a totally general learning mechanism, without any
innate biases.


My interests are in explaining how humans and other animals work, and
why they are so diverse in their competences. Most cope with 3-d
structures and processes of some sort, e.g. when feeding, mating,
burrowing, making nests, escaping predators, etc.

Some seem to start with a pretty good, and therefore presumably
genetically determined grasp of at least some aspects of 3-D structures
and processes -- e.g. enough to let new-born deer run with the herd.

It's not clear what humans, chimps or crows have initially, but I
suspect its much more specific than a preference for an ordering of the
sort mentioned above. It probably has a lot to do with requirements for
being able to suck and explore many movements of eyes, hands legs, and
later fingers, tongue, etc., plus a meta-level competence that drives
playful exploration of those devices and uses a criterion of
'interestingness' to store and generalise some results which are later
used as new building blocks for the exploratory process. I've been
trying to develop that idea with a biologist, Jackie Chappell, who works
on animal cognition. Our papers on the topic are included here,
including one presented at IJCAI 2005.

Our argument that some of that competence needs to be genetically
determined is based on the claim that no totally general learning system
could achieve the same results in the same time.

Hence the example of the rotating sphere as a challenge for someone to
prove that claim wrong.

Further reading

Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham