School of Computer Science THE UNIVERSITY OF BIRMINGHAM CoSy project

Sensorimotor vs objective contingencies
Aaron Sloman
(With thanks to Kevin O'Regan, Maria Staudte, Achim Jung,
Geert-Jan Kruijff, Jackie Chappell, Arnold Trehub,
and members of the Birmingham CoSy team)

Updated: 2 Oct 2015 (typo)
27 Apr 2008 (Minor corrections)
14 Aug 2007
In the updated version of my poster presentation at PAC'07 Conferenc, (Bristol July 2007)
Consciousness in a Multi-layered Multi-functional Labyrinthine Mind
I include a re-interpretation of the sensorimotor theory of colour perception presented by Kevin O'Regan at the conference.
30 Jun 2007
Added in Further Reading section a link to Shimon Edelman's review of Alva Noë 'Action in Perception'.
Updated: 8 Apr 2007
Added in Further Reading section a link to Ned Block's review of Alva Noë 'Action in Perception'.
Updated: 6 Oct 2006
Thanks to Alasdair Turner for pointing out that 'intrasomatic' and 'extrasomatic' mix up Latin and Greek. I have therefore replaced them with 'somatic' and 'exosomatic', below.
Initially written in May 2006
Updated: 30 Jun 2006
This file is
Warning: my use of 'extrasomatic/exosomatic' below has nothing to do with 'out of the body experiences'. Rather it refers to the content of representations: somatic information is information about things happening within the body (eg. in patterns of sensory and motor signals), whereas exosomatic information is information about things happening in the environment, most of which exist independently of the body of the perceiver. ['Soma' = 'body' (ancient Greek)]


Adjust the width of your browser window to make the lines the length you prefer.
This web site does not attempt to impose restrictions on line length or font size.

[0] Introduction
I have been trying, with limited success, to get people to understand the importance (for theories of mental processes including learning, perception, reasoning and communication), of a distinction between learning about sensorimotor contingencies (concerned with relations between states, events and processes within an animal or machine, also referred to below as 'somatic') and learning about objective condition-consequence contingencies (concerned with relations between states, events and processes in the environment, also referred to below as 'exosomatic').

The distinction is important for theories of infant development, for the design of robots that act in and learn about their environment, and for philosophical and other theories of embodied cognition. One way in which it is important is that it leads to the question whether, and under what conditions, an individual animal or robot starting only with information about the internal motor and sensor signals can use general learning mechanisms, e.g. self-organising nets, or compression algorithms, to derive, in a reasonable time, e.g. several months or a year or two (rather than evolutionary time scales), an ontology referring to external objects, e.g.

Note added 30 Jun 2006: A PDF slide tutorial introducing these ideas with photographs and links to videos can be found here
We assume that learning mechanisms can have access to the contents of sensor and motor signals and also internal signals between subsystems within the animal or robot. (Actually that assumption needs to be qualified: for a system to be able simultaneously to observe, record, and search for patterns in all combinations of internal signals would require an infinite regress.)

The questions we are asking are:

If discovery of such an ontology is not possible using totally general-purpose learning mechanisms (i.e. mechanisms with no specific 'innate' information about a 3-D physical environment), then such a learner will need some genetically determined mechanisms or structures (which could include forms of representation and ontologies) which steer the learner towards discovery of an appropriate ontology or which provides the basic building blocks for such an ontology.

A specific example this question is whether totally general learning mechanisms would enable a machine presented with the 2-D shadow of a rotating wire-frame cube to invent concepts of 3-D spatial structure, if all it started with was an ontology of 2-D binary features, which might be moving. This example is discussed at some length, with links to illustrative movies here.

There are examples of rich innate knowledge about the environment
In some precocial species there are clearly very sophisticated competences that are not learnt by individuals, such as the ability of newborn deer shortly after birth to walk to the nipple and start sucking, and soon after that to run with the herd even on uneven terrain. This raises the question whether such animals use any sort of representation of a 3-D environment containing objects with parts and relations that exist whether sensed or acted on or not. It is not obvious whether such animals have all the required mappings between sets of sensor signals (including internal sensors measuring need for food etc.) and motor signals somehow preconfigured by genetic mechanisms in a manner that suffices to explain all their behaviours.

My guess is that the combinatorics required to encode all behaviours in terms of multimodal somatic relationships would be too much for the capacity of either the genome or the brain, at least in the case of grazing mammals, though perhaps not in the case of insects, but I cannot prove this.

An exosomatic ontology allows considerable reductions in information capacity, since it allows the animal to use mappings between sensory states and environmental descriptions, and mappings between combinations of environmental descriptions current sensed needs, and output signals). If each environment state can produce K combinations of sensory signals (e.g. depending on viewpoint), and each combination of environmental state and a set of goals can require M sets of motor signals, depending on how the goal is achieved and how the animal is related to the objects in the environment, then storing sensorimotor mappings will require something like K*M information items, whereas storing only the two sets of mappings listed will require only K+M items, which could be a much smaller number. (Try some values of K and M in the thousands).

Of course, the latter solution, making use of the exosomatic ontology requires extra mechanisms for combining the mappings appropriately. But it also gives extra benefits including the ability to represent future or remote possibilities independently of how they are sensed or produced, as discussed elsewhere in connection with fully deliberative systems.

If the precocial animals do use such an exosomatic ontology then that demonstrates that, at least over evolutionary time-scales, an exosomatic ontology, including condition-consequence contingencies required for acting in a 3-D world. can be acquired, using only a very general learning mechanism with no specific commitments to any particular ontology, i.e. the mechanism of evolution. I.e. if some precocial species have an exosomatic ontology at birth, then that demonstrates that given millions of years and huge numbers of individuals a general learning mechanism can produce an exosomatic ontology.

So, given that some species (e.g. humans and perhaps other altricial species, such as primates, hunting mammals, nest-building birds) are apparently born without that sort of representational competence, but seem to acquire it within a few months or years the question is

Using material from various discussions in which I have attempted to explain the difference between the somatic and exosomatic learning, this paper lists some possible reasons why different sorts of people fail to appreciate the distinction: e.g. some are concept empiricists deeming the second half of the distinction inconceivable, while some use the phrase 'sensorimotor' so broadly as to cover both categories, not realising the importance of the subdivision they are not attending to.

Various examples are presented that illustrate the distinction and its importance for robotics, for psychology and neuroscience, and for understanding the evolution of intelligence. This elaborates on some of the points made in a discussion document on 'Orthogonal Recombinable Competences', analysing some of the requirements for developing normal competences of a young child.

[1] Terminology
In my original attempts to explain the distinction I argued that sensorimotor contingencies were inadequate because some intelligent systems also required what I called 'objective condition/consequence contingencies'. But I had great difficulty getting some people to see the point. Eventually I realised that there were different sources of resistance to the point, and that my terminology got in the way of overcoming that resistance.

It is possible that some people would understand the distinction better if I used different labels for the contrast, e.g. instead of talking about

for some people it is easier to communicate if use this distinction: and for others this distinction may be more appropriate:

The egocentric/allocentric distinction (discussed below) is a different distinction, though it could be confused with this one.

Since I first started working on this document early in 2006, I have gradually moved towards making more use of the words 'somatic' and 'exosomatic' mainly because they do not carry prior theoretical baggage for most people.

[2] Ambiguity: Broad and narrow notions of 'sensorimotor contingencies'.
One source of communication problems surprised me. I turned out that some people use the label 'sensorimotor contingencies' so broadly as to include both categories that I am trying to distinguish. Such people cannot see the point if they already use 'sensorimotor' (sometimes written 'sensory-motor' or 'sensory motor') to subsume what I call the amodal/objective condition/consequence contingencies. For such people, my task is to help them understand the importance of distinguishing two very different sub-categories within a broad category that they already use.

For others the label may be used in the narrow sense. For such people, my task is to help them see the importance of a variety of phenomena that they are ignoring.

NOTE -- history of the word 'sensorimotor':
As far as I know the label 'sensorimotor' (or rather a French version) was first introduced by Jean Piaget, whose work has influenced me deeply over many years even when I do not agree with him. I have not checked whether he intended it to refer only to what I've called 'somatic' information. My suspicion is that he was not sufficiently precise about this, as he was using a very different conceptual framework from ours.

NOTE -- What do I mean by 'concept':
To a first approximation a concept can be thought of as
a re-usable component of information that can be combined with others to produce propositions, goals, questions, intentions, desires, memories, and many other kinds of mental entities with information content.
Some concepts are atomic and indivisible whereas others require parameters in order to be used e.g. 'efficient' (for what?), 'tall' (relative to what comparison class). Some are highly abstract and topic neutral, such as logical concepts, 'not', 'or', 'implies', 'all', 'most'.
If you feel comfortable with all my uses of the word 'concept' so far, including that rough, partial, definition, you could try skipping to the next section but you should read this note if you do not feel clear about the notion 'concept'. It's a complex and confusing notion.
I make heavy use of the word 'concept' in this paper, though I do not attempt to define it. My use of the word involves a number of assumptions, including the following.
  • There are various kinds of mental states and processes that have information content, including beliefs, desires, plans, percepts, generalisations, memories of specific episodes
  • The information contents are typically not atomic, indivisible entities, but have internal structure, and different mental states and processes may share common elements. E.g. you may see something, you may want to hold that thing, you may be afraid of holding it, you may later wonder where it is, you may remember having bumped into it in the past. Each of those types of mental states can also be applied not just to an individual object but to a type that is common to many instances (e.g. 'apple', 'grasping', 'shadow').
  • In some cases, though not all, the operation of a mind or brain depends on those different states and processes sharing some structure that refers to that type of entity or to that individual entity. Those are two types of concepts (type and token concepts) that can be shared between different mental states.
    (Not in all cases because different parts of the brain may build their own representations of the same thing and not communicate with each other except through their interactions with the environment, not through shared information structures. Making them communicate introduces the 'binding' problem, solved in different ways in computers, including use of pointers, use of patterns that can be matched, use of unique global symbols, etc. Brains may use far more complex methods to solve this problem.)
  • There are many other sorts of concepts, referring to properties, relations, states, processes, functions, spatial and temporal regions, routes, and mental states of oneself and others.
  • Many (some would say 'all') concepts have two aspects discussed in various ways by philosophers like Mill, Frege, Russell, Strawson and others. The first aspect is the thing or entity or class of things in the world that correspond to the concept (often called 'denotation', 'extension', 'reference', or 'Bedeutung' (Frege)). The second aspect is much harder to define but is concerned with how the concept refers to those things rather than something different (often called 'connotation', 'intension', or 'Sinn' (Frege)). Some would add other features, e.g. associations of the concepts, the mode of representation.
Some random further reading

Wikipedia on 'Sense and reference'
Frege's article

Conceptual Spaces: The Geometry of Thought by Peter Gärdenfors

I have tried to make this notion of "concept" neutral as to how the information is actually represented, whether it is propositional, pictorial, in activation states of some neural mechanism, or distributed over synaptic strengths. I am also trying to be neutral as to whether they need to be discrete or continuously variable, whether they are parametrised, whether they form tree-structures or other forms of organisation, and whether they are expressible in any kind of notation or formalism known to humans. I do not assume that everything that is thought, perceived, desired, etc. by any animal or machine must be expressible in English, or French, or Urdu, etc.

There are philosophers who discuss the possibility of non-conceptual thought. So far I have not been able to make sense of this notion, since it seems to me that for such thought to play a role in a working animal or machine it would need to have the structural relations to other thoughts, percepts, desires, etc. in terms of which I have introduced the notion of concept used here. If they mean only that the concepts are not based on or applied in use of human (public) language, then all concepts used by pre-linguistic children and animals that do not talk would be non-conceptual! Of course, the use of shared structure or shared content between information states presupposes the use of some sort of internal formalism even in non-linguistic animals and prelinguistic children. As far as I know there are no good theories around as to what those internal formalisms might be like. Fodor's book 'The language of thought' proposed a totally implausible answer which, as far as I know nobody, not even Fodor, believed.

Most people will find my explanation of what 'concept' means very unfamiliar and counter-intuitive. Nevertheless I believe that the idea I have tried to articulate, namely the idea of reusable components shared between different kinds of information contents of different mental states, is close to the use of the word as referring to the building-blocks of thought over centuries of philosophy. However I have generalised that a little to allow for forms of information processing older philosophers never dreamt of. There are many people who write as if every concept were a concept of an type of object: namely an isolatable, possibly enduring, component of reality, such as tomatoes, trees, tornadoes. But that leaves out relation concepts, function concepts, many concepts referring to abstractions, such as arithmetical division and prime numbers, and logical concepts, and is therefore unacceptable.

You can see how loose and diverse existing usage is if you give to google 'define:concept' and follow up some of the links. Some of the confusion arises because some people allow concepts to be propositions, and therefore being capable of being true or false, whereas others restrict the label to building blocks of propositions, so that concepts may be usable or useful, well defined or coherent or ambiguous or incoherent, but never true or false. (An incoherent concept would be 'the direction in which the whole universe is moving') A little more, but not much more, coherence can be found here

[3] Links with philosophy of science: theoretical concepts

Some people fail to understand the distinction being discussed here because they regard the 'exosomatic' half of it as incoherent. That is because they are (often unwitting) concept empiricists. What they don't know is that concept empiricism is an old idea that was refuted first by Kant and then more comprehensively by 20th century philosophers of science.

Concept empiricists assume that somatic sensorimotor contingencies must exhaust all empirical knowledge about the environment, and therefore all the knowledge acquired by animals or robots during infancy because they believe that every concept used by an animal or robot must have been derived by some form of abstraction from experience of instances (plus abbreviated explicit definitions using previously acquired concepts). It is an old and venerable theory, but that does not make it true!

The label 'concept empiricism' has a number of different uses. The principal use of the phrase that I am concerned with (explained briefly in this tutorial for a philosophy of AI course) refers to the philosophical theory that there are two kinds of concepts: those that are derived from experience by abstraction from instances (primitive concepts) and concepts defined in terms of those primitive concepts using logical operators like 'and', 'or', 'not', and possibly all the apparatus of predicate calculus. For example, if 'bigger than', and 'red' are both primitive empirical concepts derived from experience of things bigger than other things, and of red things, then a logically educated concept empiricist would allow 'bigger than something red' as an acceptable concept.

The claim of the strong form of concept empiricism going back to David Hume and his predecessors is that every concept anyone uses is either an indefinable (primitive) concept understood on the basis of abstraction from experience of instances or a compound concept defined (by our faculty of 'imagination') using only logical operators and primitive concepts (or previously defined concepts of the same general sort). E.g. if you have experience of a horn growing out of the head of an animal and experience of a horse, then your imagination can combine them and create the concept of a unicorn. This process of combination of concepts can be repeated indefinitely. But according to concept empiricism all such concepts are ultimately defined in terms of concepts derived from experience.

People who find concept empiricism tempting as a philosophical theory are mostly ignorant of the difficulties in concept empiricism identified by philosophers like Immanuel Kant (1780), who argued that you cannot have any experience without using concepts, so that some concepts must be acquired independently of experience, and more recently by 20th century philosophers of science such as Carnap, Hempel, Popper, Pap, and others, who realised that scientific theories with deep explanatory power typically use concepts that cannot be derived from experience of instances by an abstraction process plus use of explicit definitions of new concepts in terms of old ones. The technical label for such concepts in the philosophy of science literature is, unsurprisingly, 'theoretical concepts'. For more on this see the previously mentioned tutorial.

My claim is that there are many theoretical concepts, including not only the concepts of deep scientific theories, but many everyday concepts, like 'curved surface', or 'rigid object', which can neither be understood solely on the basis of experience of instances, nor defined in terms of other concepts that can be.

The idea is related to what Einstein described as
the essentially constructive and speculative nature of all thinking and more especially of scientific thinking
Quoted in a very interesting online paper by Don Howard in the Stanford Encyclopedia of Philosophy, which goes on to report Einstein as arguing that..

if theory choice is empirically determinate, especially if theoretical concepts are explicitly constructed from empirical primitives, as in Carnap's program in the Aufbau (Carnap 1928), then it is hard to see how theory gives us a story about anything other than experience. As noted, Einstein was not what we would today call a scientific realist, but he still believed that there was content in theory beyond mere empirical content. He believed that theoretical science gave us a window on nature itself, even if, in principle, there will be no one uniquely correct story at the level of deep ontology...
Notice that one of the (deceptive) attractions of concept empiricism is that it seems to give us a guaranteed way of getting at useful concepts: derive them all from experience by abstraction processes followed by explicit definitions to extend the initial set. But, as Einstein noted, the deep concepts of science cannot be derived that way.

[4] Concept Empiricism: Sample philosophical papers
For people who are new to these philosophical disputes, here are two papers, one on each side:
a recent paper by a philosopher defending concept empiricism (PDF)
The Return of Concept Empiricism
[Penultimate draft of chapter in H. Cohen and C. Leferbvre (Eds.) Categorization and Cognitive Science, Elsevier (forthcoming).
Jesse J. Prinz
Department of Philosophy, University of North Carolina at Chapel Hill
a recent paper by a philosopher attacking concept empiricism (PDF)
Concept Empiricism: A Methodological Criticism (to appear in Cognition )
Edouard Machery
University of Pittsburgh
Department of History and Philosophy of Science

I (AS) wrote a couple of anti-concept empiricist papers about 20 years ago in the context of trying to explain how it was possible for computers to use symbols to refer, one in 1985 (What enables a machine to understand?) one in 1986 (Reference without causal links.) Some of the points in those two papers are summarised in connection with 'The main objection' below.

NOTE: more terminological problems
The symbol-grounding hypothesis and some versions of concept empiricism not only require concepts to be ultimately derived from experience, they also require them to be represented in some format that is closely related to the content of sensory arrays. I believe it would be better to separate out the question of origin from the question of format.

In a paper written in 1971 I distinguished 'Fregean' from 'analogical' representations, also pointing out that these are not exhaustive categories. The distinction is concerned with how complexity of a type of representation determines what it represents. Fregean representations use the function/argument structure to determine semantic content (e.g. using something like standard Tarskian compositional semantics -- as foreseen by Frege), whereas analogical representations are those in which properties of and relations in and between the representations correspond to properties of and relations between things represented.

I don't see any reason to link the analogical form with the notion of being derived from experience of instances: e.g. many mathematicians and computer programmers use graphs and flow-charts as analogical representations of very abstract things like morphisms (in Category theory) and virtual machine transitions in software systems. Likewise there is no reason why an atomic symbol in a Fregean representation e.g. 'pink' in "This apple is pink" should be incapable of corresponding to a concept derived from experience. (Except that I don't believe colour concepts can be simply derived from experience of instances, but that's another topic.)

It is also a mistake to assume that analogical representations are necessarily 'perceptual', or related to the structure of sensory arrays. A transfinite ordinal (e.g. the natural number sequence) can be (and has been) used as an analogical representation of a form of computation but I don't think anyone believes transfinite ordinals can occur in sensory arrays, at least not in this universe. People who have encountered the Schroeder-Bernstein theorem in set theory may recall that an intuitive proof can use an analogical model of two reflective surfaces with light rays bouncing between them infinitely many times: analogical but not capable of being sensed. It is also capable of being proved (some would say more rigorously) using a Fregean notation.

Another problem is that some people assume that perceptual concepts must somehow be expressed in terms that relate to the sensory input patterns involved in the relevant form of perception. But that is based on a shallow theory of perception. I have argued elsewhere here in 1978 and more recently here, that, for example, a visual architecture is multi-layered, and processes information at different levels of abstraction, some of which are only very loosely related to the sensory input. E.g. when a child sees a toy train going through a tunnel much of what it sees is not sensed!

[5] So-called Symbol Grounding
Cognitive scientists and roboticists who are concept empiricists (witting or unwitting) tend to want to talk about 'symbol grounding' as a requirement for symbols to have meaning. The belief that such 'grounding' is possible is, as indicated above, one of the attractions of concept empiricism.
This notion is criticised and contrasted with 'symbol tethering' in this PDF slide presentation:
Getting meaning off the ground: symbol grounding vs symbol attachment/tethering
All I am saying to those who don't have the concept empiricist prejudice and who merely use the label 'sensorimotor' very broadly, is that there is an important sub-division of cases, namely between

The latter requires the animal or robot to be capable of using more sophisticated mechanisms and a richer, more powerful, ontology.

Notice that there are different ways in which the ontology can be richer. Added richness may be required for a 3-D environment as compared with a 2-D environment in both:

Philippe Rochat's book The Infant's World presents evidence that even within the first four months human infants have some version of that richer ontology, referring to independently existing things in the environment, e.g. objects with surfaces. Exactly what sort of ontology they have, how they represent it, how they use it and extend it, probably cannot be discovered using only the methods available to developmental psychologists.

There are other ways in which the ontology used to represent information about the environment can go beyond patterns and associations within and between sensory and motor signal arrays, including referring to materials of which things are made, referring to causal powers of things (e.g. elasticity, electrical resistance, and affordances for actions), and also referring to internal states and processes in other more or less intelligent agents. Also deep scientific theories extend such ontologies even further.

[6] Reducibility: Can (exosomatic) condition/consequence contingencies be reduced to (somatic) sensorimotor contingencies?
It may be that some people regard the distinction as real, but not very important because the objective concepts can be in some sense reduced to the subjective ones, and as a result can be discovered automatically by some general purpose learning mechanism that is looking for economical ways of encoding the subjective sensorimotor contingencies. (Einstein strenuously denied this, in the quotation above.)

The claim that the objective concepts can somehow be reduced to the subjective ones is a generalisation of Hume allowing that some concepts that are not derived from experience are definable in terms of others that are.

An example might be a learning system observing 2-D projections of rotating and moving 3-D wire-frame cubes and initially able merely to use some characterisation of the 2-D space, and motions in it (e.g. collections of 2-D vectors for points, and 4-D vectors for lines, with all motions represented as changes in values of those vectors).

For a web site showing a cube rotating see:
Can you see where the axis of rotation is in 3D? Is it static or does it move? How is it oriented in relation to the viewing plane?
Try describing what is happening by referring only to the 2-D processes in the image, never mentioning anything that is not happening in the image plane. E.g. without thinking about the faces of the cube can you say how many lines are visible at different times?
Artists have to learn to pay attention to such things. Most people ignore most of them and perceive only the 3-D structure represented. Perhaps some kinds of brain malfunction could remove the ability to see the 3-D structures and processes, leaving only the 2-D competence. Is this a feature of some kinds of autism?

If such a system had a way of recording observed 2-D processes and searching a space of possible representations for a more economical representation, it might find that by going for a more complex representation of the static structures, by adding a dimension, so that points require 3-D vectors and line segments require 6-D vectors it can then find far more economical representations of the observed 2-D processes, including representing them using algorithms that allow relatively simple predictions to be made. E.g. if a 3-D cube is rotating about a specified 3-D axis at 10 degrees per second, then after 30 seconds the 30 degree rotation will determine the locations of all the edges in 3-D and a simple 2-D projection algorithm will specify for each edge where it appears.

The reducibility claim would be that 3-D structures and processes can be defined in terms of higher order features of 2-D structures and processes, including relationships between such processes.

Notice that the mathematical apparatus required to express such a high order relationship will be quite sophisticated. So defenders of the reducibility claim may be giving up an assumption of an innate, or genetically determined 3-D (objective) ontology, in return for an assumption of very abstract and powerful innate mathematical competence.

I don't know if anyone has already developed a learning system that could discover the 3-D structure in the patterns of motion in a 2-D projection, without having any prior knowledge of the possibility of using an additional dimension. Most automatic learning systems are concerned with dimension reduction.

I have asked many people, including a colleague who is a well known mathematical computer scientist whether there is any known means by which a general-purpose learning, or compression algorithm, could discover the 3-D ontology that we take for granted, by looking for higher order patterns in 2-D projections of a 3-D environment, but so far have not found any evidence that anyone knows how this might be done.

Of course, as Achim Jung remarked to me, such a learner moving around a 3-D environment (without realising that it was doing this) might find that somethings are constantly changing their shape (i.e. because the 2-D projections change) whereas a small subset do not (i.e. spheres always project to circles), though they may change their size. But it would require an additional creative step to invent the ontology that we use to explain why this is so -- if the learner did not start with an implicit presumption that the environment is three dimensional (which might be the result of millions of years of evolution).

This description of the process will be far simpler than, for example a description that finds patterns of motion of different lines and works out for each of the 12 lines how its behaviour changes predictably. The relative economy of the 3-D representation will be even greater if the whole cube can be translated and if the axis of rotation can change its location and orientation in 3-D space.

Notes on Kant, Minsky, Frames and Aspect-Graphs

I think when Immanuel Kant wrote his Critique of Pure Reason (CPR) (circa 1780) he understood the point I am making, though he lacked some of the ideas we now have regarding possible explanatory mechanisms.

This insight is shown in his discussion of the appearances of a house and how they can change as you move around the house and up and down the house.
Search this link to extracts from CPR (translated from German) ( here is another link ) for occurrences of 'house', e.g.

Now if appearances were things in themselves, then since we have to deal solely with our representations, we could never determine from the succession of the representations how their manifold may be connected in the object. How things may be in themselves, apart from the representations through which they affect us, is entirely outside our sphere of knowledge. In spite, however, of the fact that the appearances are not things in themselves, and yet are what alone can be given to us to know, in spite also of the fact that their representation in apprehension is always successive, I have to show what sort of a connection in time belongs to the manifold in the appearances themselves. For instance, the apprehension of the manifold in the appearance of a house which stands before me is successive. The question then arises, whether the manifold of the house is also in itself successive. This, however, is what no one will grant.

He points out that although there are many possible 'subjective' sequences of views, the possibilities are constrained by something 'objective' which is not directly perceived but has to be assumed as underlying and explaining the appearances and their constraints, namely the 'objective' structure of the house itself. He might have added that laws of geometry, topology, and the physics of vision all play a role in determining the constraints. (He did not know about closed-circuit TV, for instance, which allows addition sequences of views to be experienced.)
(Eavesdrop on some philosophers discussing this passage here.)

It looks as if Kant had some of the key ideas concerning 'frame systems' about 200 years before Minsky showed their importance in A Framework for Representing Knowledge (1974)

It is interesting to note that Minsky distinguishes two questions:

is imagery symbolic and is it based on two-dimensional fragments?
and comments on the second question thus:
the issue of two- vs. three-dimensions evaporates at the symbolic level. The very concept of dimension becomes inappropriate. Each type of symbolic representation of an object serves some goals well and others poorly. If we attach the relation labels left of, right of, and above between parts of the structure, say, as markers on pairs} of terminals, certain manipulations will work out smoothly; for example, some properties of these relations are "invariant" if we rotate the cube while keeping the same face on the table. Most objects have "permanent" tops and bottoms. But if we turn the cube on its side such predictions become harder to make; people have great difficulty keeping track of the faces of a six-colored cube if one makes them roll it around in their mind.

If one uses instead more "intrinsic" relations like next to and opposite to, then turning the object on its side disturbs the "image" much less. In Winston we see how systematic replacements (e.g., of "left" for "behind," and "right" for "in-front-of") can simulate the effect of spatial rotation.

What Minsky wrote is correct, and we could add that continuous rotation can be simulated by continuous variation of coordinate values in a symbolic representation (as Geoffrey Hinton pointed out in response to Kosslyn, long ago). It is important that not all 3-D representations are equally useful for different purposes. E.g. a voxel-based 3-D representation of some object (nowadays achievable using laser-range finders and other technologies) is good for producing images from multiple viewpoints, but not directly useful for planning actions using the object -- e.g. selecting grasping points.

My point is that we can distinguish an ontology that is based on the assumption of 3D space (plus time) and that permits the representation of 3D structures, relationships, and processes and the 'exosomatic' contingencies relating changes in those relationships, and an ontology that is concerned only with the contents of sensory and motor signals and their relationships at various levels of abstraction including the 'somatic' contingencies.

I suggest that evolution 'discovered' that there are enormous benefits of economy in the former, for some animals and robots, but not all (e.g. not for microbes and maybe not for all insects). See also the discussion of grasping in this paper on orthogonal competences.

Some of the ideas of frame systems seem to have been independently invented by Koenderink and van Doorn in their concept of an 'Aspect graph':
J. J. Koenderink and A. J. van Doorn
The singularities of the visual mapping
Biol. Cyber., 24:51--59, 1976

I believe that a generalisation of the ideas of aspect graphs/frame systems can be used to represent information about affordances, but that's a topic for another occasion. It requires including many more kinds of condition-consequence rule than the rules about how movement of objects or viewpoints change appearances. Dealing with articulated and flexible objects adds further problems to be addressed.

[7] Nature nurture issues
Clearly biological evolution found a way of producing systems able to represent 3-D structures and processes, as humans certainly can. Whether that has to be re-discovered by each individual human, using a learning algorithm provided by evolution, or whether the result of the learning was direct provision of a 3-D ontology in the genome is an empirical question. Since some animals are born with sophisticated 3-D competence (e.g. newly hatched chicks walk around pecking for food, and can imprint on and then follow a hen that is recognised from different viewpoints and new-born deer can very quickly walk to the nipple to feed and can also very soon run with the herd), it seems at least possible that an ontology for a 3-D environment is transmitted via the genome, perhaps along with mechanisms for extending that ontology.

If that is right, some animals seem to be capable of much richer and more diverse extensions than others.

It is possible that some of that ontology is constructed by some sort of learning process prior to birth or hatching while the chick is still growing in the egg or the deer growing in the womb. However, the visual information available to either of those will be quite unlike what is available immediately after birth or hatching, so what is learnt in that case will not be extracted from visual sensorimotor patterns. and corresponding

[8] Different languages use different ontologies
Perhaps it is useful to remind people of the differences between
(a) The language of physics (chemistry, engineering, geology, etc.) which refers to things, events, processes in the *environment* and their consequences in the environment, including some things that are not observable using unaided senses, e.g. mass, specific heat, elasticity, chemical composition, gravitational attraction, electromagnetic fields, atoms, electrons, neutrons, etc.

(b) The sensorimotor language that many AI people and cognitive scientists think has to be taken as primitive and the basis of everything else (as also thought many philosophers, e.g. Locke, Berkeley, Hume) namely a language referring not to what's going on in the environment, but only to patterns in sensory and motor arrays *within the perceiver*, and relations between them.
[I think Piaget used 'sensorimotor' in this restricted way, claiming that infants did not get beyond sensorimotor learning for many months -- criticised by Rochat and others on the basis of recent evidence.]

My suggestion is that from a very early stage infants use a genetically determined ontology that includes some subset of 3-D structures and processes, but learn a great deal more about the variety of possible contents of 3-D space through play and exploration. For more on this see the 'Orthogonal recombinable competences' web site.

One of the important kinds of development of the infant ontology is learning that there are different kinds of physical stuff (e.g. skin, wood, plastic, cloth, water, mud) not all of which can be discriminated simply by their static appearance, and which have various properties that are not directly available to the senses without performing actions on the objects, such as rigidity, hardness, fragility, elasticity, viscosity, stickiness, weight, solubility, etc. The process of learning to extend the ontology to include such things continues into adult life and merges into scientific enquiry.

[9] The relevance of ambiguous figures and movies
Most 2-D paintings and drawings unambiguously depict a 3-D scene. However there are many famous ambiguous drawings and diagrams that are easily interpreted as representing (at least) two very different 3-D structures, including the Necker cube and the Old/Young woman and the Duck rabbit. The first and the last are included here
cube and duck-rabbit

Some of those displays are bistable, like the Necker cube. When the interpretation flips the sensory contents (in the sensory input signals) do not change, but the 3-D interpretation, which is not in the sensory signals does change.

I believe it is constructed in a part of the visual system that evolved relatively late and runs in parallel with lower level portions that merely deal with the given 2-D information structures and movements. Seeing faces as happy or sad, or objects as rigid or flexible, uses other extensions to the visual system that can run concurrently with the lower level sub-systems. That is why in diagrams of the architecture of a human like intelligent agent I show the perceptual (and motor) systems as multi-layered, e.g. in the H-Cogaff architecture.

Compare the rotating cube with a lot of dots rotating about a point which is moving. In the latter case, it is possible not to notice that anything more is happening than a lot of moving dots. Maybe some forms of brain damage would prevent the motion pattern being seen. But IF the existence of the common centre of rotation is detected then its description does not require going outside the ontology of relations in a 2-D domain, as the representing, experiencing, thinking about, or seeing a rotating 3-D cube does.

G. Johansson's 'biological motion' movies

Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception & Psychophysics, 14, 201211.

For a useful overview see this presentation (requires Powerpoint or OpenOffice)
A couple of demos are included here.
And an adjustable one here.

Johansson showed that if lights are attached to joints of one or more human beings (or parts of other animals) and their movements are recorded on film, people watching the movie see only a 2-D array of lights when they are static but as soon as motion begins that immediately triggers a rich 3-D interpretation including people doing things. Of course the 2-D lights continue to exist, and to move, and the detection of those movements in the sensory input to the visual system must be part of what goes on.

But the motion seems to invoke the activity of a specialised subsystem of the visual system that goes beyond what is in the sensory input stream to construct percepts involving 'objective' 3-D structures and processes including in some cases social activities such as dancing or boxing, or other intentional activities such as doing press-ups or climbing up and down a ladder. This requires an ontology that goes far beyonds the contents of sensory arrays.

It is possible that some animals would not perceive the 3-D relationships and processes: for them the Necker cube would just be a collection of lines in 2-D plane, and the Johansson movies would merely contain moving dots. (Maybe some autistic people too?)

A visual system that sees the 3-D structures and processes needs a richer ontology than one that merely describes 2-D structures and processes. It needs extra representational power. There is neither a requirement nor any logical possibility of providing that power by defining the exosomatic, amodal, semantic content in terms of the somatic sensorimotor information. Of course, evolution arrived at some systems that had these exosomatic representational capabilities via organisms that initially had only somatic sensorimotor control mechanisms (as some theorists still think fully describe brain functions). But evolution is far more creative than a system restricted to deriving all its concepts from some initial set.

[10] Issues for brain science
More needs to be said about how all this could be implemented in brains. In particular, there are many ways in which things discussed here undermine any simple mapping between sensory signals and percepts, e.g. mappings between patterns of retinal stimulation and visual percepts.

More is needed than just going from 2-D vectors to 3-D vectors. E.g. in a 3-D world some things can exist but not be seen because they are obscured by something else. This cannot happen in a 2-D world, though things can cease to exist and then be replaced by something similar later.

Likewise, in a 3-D world part of an object can exist while obscured by another object, though that is not possible in a 2-D world (except for things seen edge-on projected to a 'linear retina' in the plane).

It is also possible in our 3-D world to see one thing through another if the second is made of water, glass or some other transparent substance. A more complete discussion would need to comment on effects that can be produced by the intervening object.

There are also properties of materials that can be perceived, e.g. fragility, rigidity, flexibility, etc. even though we don't have any sensor signals registering those properties. Rather these have to be interpretations going beyond the sensory information (compare Kant).

[11] Compression based ontologies?
It may be that, as discussed above, without being designed to have a 3-D ontology initially a system can discover the need to create one by starting from 2-D information and attempting to minimise the algorithmic complexity (Kolmogorov complexity) of the description of 2-D processes. (That would require specific 'innate' competences, of course. I know of no evidence that this can be done except by searching in a predetermined space that includes the possibility of using a 3-D ontology for locations, orientations, surface structures, etc.)

But even then, that new more economical representation (referring to different distances from the viewer, and planes in different orientations from the viewing plane, things continuing to exist while unperceived) uses a significantly different ontology from the original, just as a description of currents and voltages in an electric circuit is different from a description of readings on dials of measuring devices.

(Attempts by some empiricist physicists and philosophers, like P.W.Bridgman, to reduce the one to the other hit a brick wall early in the 20th century.)

Likewise discovering relations between temperature and resistance, or temperature and elasticity is not a matter of discovering sensorimotor contingencies (in the narrow sense) even if there are ways of changing temperature by acting on objects and ways of finding out electrical resistance or elasticity by observing them.

[12] The main objection
There is a major objection from concept empiricists to the notion that anything in an animal or robot can refer to something that that individual cannot ever experience as something in its sensory input (possibly including hallucinatory or dream experiences, depending on the variety of empiricism being defended).

The argument is roughly that for something S in the machine or organism M (e.g. a symbol or a state of a neural sub-mechanism, or anything else in M, or in a virtual machine in M) to be able to refer to some entity X rather than anything else, there must be some causal link between the referring entity S (or occurrences of its use in M) and the entity X.

It is then suggested that unless S has been associated with X insofar as both occur within M, or S is defined in terms of other things that already refer to parts of X or things related to X, M will not be able to use S to refer to X.

There are two things wrong with this. First it assumes without argument that semantic reference requires some sort of causal association. This is obviously false, since it would rule out reference to things that did not happen (such as the accident I avoided on the way to work), since non-existent things cannot be causes or effects. Secondly it would rule out reference to abstract entities that cannot be causes or effects, such a the number pi, Fermat's last theorem, or the infinitely many proofs, designs, or programming languages waiting to be discovered or invented.

Second it ignores the fact that much of science, in fact all deep science is full of references to entities that nobody could experience, and which cannot be explicitly defined in terms of possible experiences (which is why Bridgman's operationalist philosophy of science failed). Examples are neutrino, chemical bond, electromagnetic field, gene, and dark matter. If human scientists can refer to such things there is no reason to assume that robot scientists will need to have them as patterns in the sensorimotor processes in order to be able to refer to them.

How then can something be referred to without using causal links to it? A partial answer building on work of philosophers of science of the lasts century is to use the fact that a structure (such as a collection of axioms in a formal system) with associated inference rules can, as Tarski pointed out, determine a class of models, with systematic relations between parts of the structure and parts of each model. Extending the structure (e.g. adding non-redundant axioms) reduces the class of models.

In principle, if the structure is rich enough it could uniquely determine some portion of the physical world, and then parts of the structure would refer to parts of the physical world in the model. In addition, as Carnap pointed out in his book Meaning and Necessity (especially Supplement B: Meaning Postulates) some links with previously defined concepts can help to constrain the meanings of undefined symbols. (For a more detailed summary of Carnap's ideas, see his entry in the internet encyclopaedia of philosophy.

This is the idea that I have elsewhere (using a label suggested by Jackie Chappell) referred to as 'Symbol tethering'. The following diagram illustrates (metaphorically) the difference between symbol grounding and symbol tethering:

Symbol grounding theory (being a form of concept empiricism) requires every referring item to either refer to something experienced or to be defined in terms of things that refer to things experienced. Symbol tethering allows a structure that determines a class of possible models to be 'pinned down' at a few attachment points, in such a way that vast amounts of potential ambiguity are eliminated. However, as the history of science shows, ambiguity is never totally eliminated: concepts are always capable of being made more precise through further developments, both theoretical and empirical, as pointed out in L.J. Cohen's The Diversity of Meaning (1962) and in chapter 2 of The Computer Revolution in Philosophy (1978)

[13] Burying concept empiricism (and symbol grounding)
Kant put concept empiricism in its coffin, and 20th Century philosophy of science nailed down the lid. The burial comes from work on the altricial precocial spectrum in animals: precocial species show high degrees of cognitive competence almost entirely innately determined (e.g. deer that run with the herd shortly after birth, birds that peck open the egg and then climb out, etc.) The conceptual apparatus (e.g. use of perceptual categories to control actions) could not have been derived from experience in such species. There are also 'altricial' competences that are the result of play, exploration and learning. The arguments in this discussion paper imply that the learning mechanisms probably include a lot of innate information about the nature of the the environment, ways in which it can vary, kinds of hypotheses worth forming, ways of testing hypotheses etc. I.e. the competences are not genetically preconfigured, but are genetically meta-configured.

For more on this see COSY-TR-0502: The Altricial-Precocial Spectrum for Robots (A.Sloman and J.Chappell, 2005).

All of this is much too vague still: we need to show how this can work in practice by building demonstrations of robots that are able, like humans, to refer to things they have never experienced and cannot define in terms of contents or patterns in previous sensory or sensorimotor processes.

Watch this space.

[14] Summary:

This document is an elaboration of one of the themes in the theory of 'Orthogonal competences'.

It may be possible to discuss these issues at ASSC10 (in Oxford, June 23-26) where I shall be presenting a poster based on this abstract:

[15] Egocentric vs Allocentric
The egocentric/allocentric distinction discussed by many philosophers and neuroscientists seems to be different from the contrast between sensorimotor and objective contingencies.

The egocentric/allocentric contrast exists within the framework of the objective ontology referring to an environment that exists independently of the perceiver/thinker/speaker, but is inhabited by different individuals in different places, and possibly with different capabilities.

For example if I describe something as being 'on the left' when I talk to someone facing me, I may be communicating in egocentric mode, referring to my left, or allocentric mode, referring to his left. The latter is often a very important capability, especially when you are communicating with individuals who cannot tell what you are able to see, etc.

This paper

Lemay M, Bertram CP, Stelmach GE.
Pointing to an allocentric and egocentric remembered target.
Motor Control. 2004 Jan;8(1):16-32.
reports an investigation into when people use allocentric vs egocentric encodings in connection with pointing tasks.

There are hordes of online papers reporting investigations e.g. into which bits of the brain are involved in egocentric and allocentric representations, and other issues.

As far as I can tell work on the egocentric allocentric distinction refers to representation of spatial structure and relationships, including movements, and does not take account of representations of

  • kind of material,
  • properties like elasticity, rigidity, hardness, viscosity, stickiness
etc., discussed in the Orthogonal Competences document.

See also the following, especially section 4:
Pete Mandik, Phenomenal consciousness and the allocentric-egocentric interface
In R. Buccheri et al. (eds.); Endophysics, Time, Quantum and the Subjective;, pp 463-485, 2005
World Scientific Publishing Co. All rights reserved.

[16] Further Reading

[17] Admin
Maintained by Aaron Sloman
School of Computer Science
The University of Birmingham