The functions of biological vision systems
Installed: 9 Oct 2014
(DRAFT: Liable to change)
Last updated: 11 Oct 2014
This paper is
A PDF version may be added later.
A partial index of discussion notes is in
What are the functions of vision? (Again)
I think there is still a huge amount of research to be done on what humans and
other animals use vision for. E.g. Marr's claim about getting 3-D information
from retinal input is part of the story, but leaves out a large amount. Gibson's
emphasis on vision providing information about what the perceiver can and cannot
do in order to satisfy goals, preferences, etc. extends that, but isn't nearly
enough. The use of vision in
SLAM(Simultaneous Localisation and Mapping)
is different again, though it combines some of Marr's and Gibson's uses, but
also indirectly supports many other things, including reasoning about what could
be happening in remote locations, or planning, etc. The ideas of Max Clowes in
the 1960s, which I have partly summarised here, are different again:
(and the following sections) He was influenced by related views of
perception by earlier researchers such as von Helmholtz, Gombrich and others.
Another whole family of uses (which is what got me into AI originally) is the
role of vision in various kinds of mathematical discovery and reasoning, leading
up to Euclid's elements. I think those uses of vision occur unconsciously in
most humans and in many animals -- closely connected with information about
affordances and their limitations. Some examples are here:
I don't think it is possible to understand, model or replicate human visual
capabilities without exploring a variety of different questions about the uses
of vision, in humans, in other animals, in the evolutionary precursors, at
different stages of individual development and in different possible sorts of
artefacts that we may in future wish to build with visual capabilities.
What follows is a partial high level subdivision of topics to be investigated
regarding the functions of vision, in order to understand what sorts of
mechanism might be required, or might suffice, in different contexts. This is
both incomplete and very shallow: it is work in progress, at an early stage.
1. Examination of uses of vision in other animals.
(E.g. some visual competences
of non-humans, e.g. squirrels, orangutans, nest building birds -- e.g. weaver
birds) refute certain hypotheses about the central role of human language in
human intelligence. There are also different functions of vision related to
different habitats and modes of locomotion, e.g. swimming, crawling, sliding,
walking, jumping, flying in open space, flying through branches to a nest, etc.)
2. Attempts to understand the evolution of *uses* of vision, as we know it,
much earlier forms of perception, e.g. because that survey may identify some
intermediate biological uses of optical information that still play an essential
role in human vision, but an unobvious role. (This emphasis on evolution of
various kinds of biological information processing is the core of the
Meta-Morphogenesis (M-M) project.) That's a huge task with intolerably sparse
evidence, but I think there are ways of reducing the most obvious difficulties
by careful research planning.
3. Investigating how uses of vision develop in individuals from infancy onwards.
I think there are lots of very important things going on in pre-verbal infants,
that most people don't notice (though Piaget noticed some of them) and which lay
foundations for later developments. For example, I think metrical perception
(absolute values for size, length, speed, distance, volume, curvature etc.)
barely exists at first and instead many partial orderings (e.g. X is closer to
or is getting closer to Y) are perceived and used, e.g. in visual servoing.
Another class of cases concerns use of vision in distinguishing different
materials and their properties (including distinguishing them by their behaviour
rather than static appearance) and using knowledge about materials to explain
differences in observed behaviour of manipulated objects and materials -- sand,
water, treacle, oil, syrup, butter, mud, plasticine, clingfilm, cooking foil,
paper, cardboard, various kinds of cloth, etc. etc.
I think this includes many unnoticed pre-mathematical functions, on which later
mathematical competences build (unless destroyed at school).
(Jackie Chappell -- a biologist -- and I have published some papers on how the
genome can play different roles at different stages of development with later
roles being partly determined by what was learnt earlier. This can lead to
questions about different uses of vision in different cultures or in different
individuals with different developmental trajectories. Extreme cases are musical
sight-readers, outstanding painters, architects, mathematical experts in
geometry and topology, and of course programmers who are good at understanding
textual programs -- including noticing bugs others don't see. I think that's
partly related to abilities to see flaws in mechanical construction (That
building won't last in a heavy snow storm.) Working out the developmental
trajectories of various visual competences could provide important clues as to
4. Investigating uses of vision in dealing with other intelligent species.
The work on emotion recognition could be an example, but tends to be very
shallow and based on very shallow theories of human affective states and how
they relate to visible behaviour. I think there are lots of much more subtle
ways in which vision is used to gain information about intelligent individuals
(not just humans) e.g. what they are looking at, how they feel about what they
are looking at (bored, interested, surprised, dismayed...) and what they are
likely to do as a result, whether they understand something, whether they are
trying to deceive, whether they are confident about what they are doing, whether
they are doing it carefully or not, and many more.
Often perception of information processing in another individual is part of a
sophisticated interaction, e.g. a teacher trying to understand how to help a
pupil who hasn't understood something, flirting, dancing, collaborating on a
complex task, and many more. The visual cues in most of these cases are extended
over time and can include not just eyes and face, but body parts and their
relations to other things, e.g. picking up an unfamiliar object nervously, etc.
5. Investigating Cultural evolution of visual functions.
This includes a whole lot of different things -- including the uses of vision in
human sign language, which can involve perception very complex parallel
movements of many body parts. It's much richer than either speech or text.
It includes changes of visual functions of domesticated animals.
6. Investigating uses of vision in mathematical discovery, mathematical reasoning, and related processes of designing and understanding complex artefacts,
e.g. a bi-stable
spring-driven car boot (trunk) lid. This is what got me from mathematics into
philosophy, and then from philosophy into AI. The research problems are very
difficult, and progress is slow.
7. Uses of vision in aesthetic contexts:
enjoying a view, admiring a face, a
dance, a building, a tree, a painting, etc.
8. Investigating uses of vision in connection with sexual functions
Including finding a potential mate attractive, being sexually stimulated, etc.
There is much more to be added.
In particular I have been collecting a variety of different
examples of human visual abilities related to mathematical reasoning in geometry
and topology. This connects with abilities to play with and understand certain
kinds of toys, especially construction kits (e.g. meccano, tinkertoy,
Fischertechnik, etc.) Even understanding how clothes work, and how you can and
cannot put them on is related to this. An example is here:
Part of the argument is that there is no sharp divide between what counts as
vision and what counts as something 'more central'. Often more central processes
go on in registration with some of the optic array details. A surprising case is
Look at each face for a couple of seconds at a time. You may notice that the
eyes look different. That may be connected with the way some things
look fragile, look unstable, etc. (The eyes are geometrically identical.)
It would be useful to develop an outline (possibly evolution-inspired) framework
for collecting different functions of vision into a web site that will
include information from many disciplines, and can go on growing or being
I think that might be particularly useful both for young researchers
trying to find interesting new research problems, and for people trying to plan
research or development projects, find collaborators, etc. (This is a result of
observing how narrowly most vision researchers seem to think about the role of
vision, and how disparate and disconnected the research community is.)
A lot of help will be needed to grow it and develop the structure. I don't know
how many researchers would be interested enough to contribute, instead of simply
continuing on their existing focus.
In parallel with developing a map or functions of vision, it would be useful to
develop a map of components of possible explanatory models, for example
a map of
This would build on and feed back into the map of uses/functions of vision.
kinds of visual mechanisms,
forms of representation,
types of algorithm, (and other mathematical structures?)
functional roles for vision in a larger
But it's important that producing maps of functions and maps of mechanisms are
two different tasks, since in general different mechanisms can serve any
specific function. Most researchers seem to focus only on mechanisms, making
unacknowledged assumptions about the functions, often different assumptions in
different research teams.
To be added
Uses of vision in mathematical reasoning of a non-visual kind, e.g.
seeing and understanding algebraic, logical, computational formulae.
Uses of vision in mathematical reasoning where mathematicians annotate a step in
a proof: "By inspection". E.g. where a diagrammatic notation for exploring cases
has been discovered.
This discussion of the exploration of "Toddler theorems" (and perhaps
pre-toddler theorems) includes examples where vision plays a central role.
A child aged about 18 months was observed trying to join two toy train
trucks by bringing the two ring ends together and seemed to be mystified
and frustrated because he could not complete the assembly, despite the
clear visibility of the hook at the other end. Is there something he had
not yet developed the ability to see?
A video is available.
RELATED DOCUMENTS AND PAPERS
SLAM: Simultaneous Localisation and Mapping -- a machine explores an unfamiliar
building, or some other novel terrain at the same time as constructing a usable
store of information about the layout, possibly including a topological or
metrical map. The terrain may be 1-dimensional, such as a long corridor,
2-dimensional, a town, a layer of a building, or 3-dimensional, e.g. a
multi-storey building or non-flat terrain. The senses used may include vision,
some sort of range finder (laser or sonic), odometers, gyroscope, compass, etc.
One of the hard problems is dealing with errors, or lack of precision in the
sensor values. Errors can accumulate drastically, and several techniques have
been developed for minimising the impact.
I have been thinking and writing about functions of vision including unobvious
functions for several decades. I am collecting the papers I can find here, to
help with a future attempt to integrate the ideas.
A. Sloman (1978 Chapter 9)
Perception As A Computational Process, Chapter 9 of
The Computer Revolution In Philosophy: Philosophy, science and models of mind
Harvester Press 1978.
Aaron Sloman, 1980,
'What kind of indirect process is visual perception?'
In Open Peer Commentary on Shimon Ullman: `Against Direct Perception'
Brain and Behavioural Sciences Journal, 3, pp.401-404,
Aaron Sloman and David Owen, 1980,
Why Visual Systems Process Sketches
in Proceedings AISB 1980 Conference Amsterdam 1st-4th July, 1980
Why do people interpret sketches, cartoons, etc. so easily? A theory is
outlined which accounts for the relation between ordinary visual perception
and picture interpretation. Animals and versatile robots need fast,
generally reliable and "gracefully degrading" visual systems. This can be
achieved by a highly - parallel organisation, in which different domains of
structure are processed concurrently, and decisions made on the basis of
incomplete analysis. Attendant risks are diminished in a "cognitively
friendly world" (CFW). Since high Levels of such a system process
inherently impoverished and abstract representations, it is ideally suited
to the interpretation of pictures.
Aaron Sloman, 1983,
Image interpretation: The way ahead?, in
Physical and Biological Processing of Images
(Proceedings of an international symposium organised
by The Rank Prize Funds, London, 1982.),
Eds. O.J. Braddick and A.C. Sleigh.,
What are the purposes of vision?
Based on an invited presentation at
Fyssen Foundation Workshop on Vision,
Versailles France, March 1986, Organiser: M. Imbert
(The proceedings were never published.)
A. Sloman, 1989
On designing a visual system: Towards a Gibsonian computational model of vision.
Journal of Experimental and Theoretical AI (JETAI) 1,4, pp289-337
This paper contrasts the standard (in AI) "modular" theory of the nature of
vision with a more general theory of vision as involving multiple functions and
multiple relationships with other sub-systems of an intelligent system. The
modular theory (e.g. as expounded by Marr) treats vision as entirely, and
permanently, concerned with the production of a limited range of descriptions of
visible surfaces, for a central database; while the "labyrinthine" design allows
any output that a visual system can be trained to associate reliably with
features of an optic array and allows forms of learning that set up new
communication channels. The labyrinthine theory turns out to have much in common
with J.J.Gibson's theory of affordances, while not eschewing information
processing as he did. It also seems to fit better than the modular theory with
neurophysiological evidence of rich interconnectivity within and between
sub-systems in the brain. Some of the trade-offs between different designs are
discussed in order to provide a unifying framework for future empirical
investigations and engineering design studies. However, the paper is more about
requirements than detailed designs.
Challenge for Vision: Seeing a Toy Crane
A. Sloman, 1993,
The mind as a control system,
Philosophy and the Cognitive Sciences,
eds. C. Hookway and D. Peterson,
Cambridge University Press,
Many people who favour the design-based approach to the study of mind, including
the author previously, have thought of the mind as a computational system,
though they don't all agree regarding the forms of computation required for
mentality. Because of ambiguities in the notion of 'computation' and also
because it tends to be too closely linked to the concept of an algorithm, it is
suggested in this paper that we should rather construe the mind (or an agent
with a mind) as a control system involving many interacting control loops of
various kinds, most of them implemented in high level virtual machines, and many
of them hierarchically organised. (Some of the sub-processes are clearly
computational in character, though not necessarily all.) A feature of the system
is that the same sensors and motors are shared between many different functions,
and sometimes they are shared concurrently, sometimes sequentially. A number of
implications are drawn out, including the implication that there are many
informational substates, some incorporating factual information, some control
information, using diverse forms of representation. The notion of architecture,
i.e. functional differentiation into interacting components, is explained, and
the conjecture put forward that in order to account for the main characteristics
of the human mind it is more important to get the architecture right than to get
the mechanisms right (e.g. symbolic vs neural mechanisms). Architecture
Of course, we need to get both right -- but with the wrong architecture, and the
wrong decomposition into sub-functions, we are likely to seek entirely
inappropriate mechanisms and algorithms.
Aaron Sloman, 2011,
What's vision for, and how does it work?
From Marr (and earlier) to Gibson and Beyond,
Online tutorial presentation, also at
Architectural and Representational Requirements for Seeing Processes, Proto-affordances and
From Dagstuhl workshop on
Logic and Probability for Scene Interpretation (2008)
Evolvable biologically plausible visual architectures,
Proceedings of British Machine Vision Conference (BMVC-01), Manchester,
Eds. T. Cootes and C. Taylor, BMVA, pp. 313--322,
School of Computer Science
The University of Birmingham