Their significance for mathematical cognition,
current serious limitations of AI vision systems,
and philosophy of mind (contents of consciousness).
This is part of the Turing-inspired Meta-Morphogenesis project,
concerned with identifying and explaining the many transitions in types
of information-processing in the course of biological evolution on Earth:
Current AI vision systems are not able to have mathematical qualia, as
Archimedes and many of their contemporaries, predecessors and successors
had! I am trying to understand requirements for removing that limitation.
Inspired in part by Immanuel Kant, Oscar Reutersvard, James Gibson.
and Max Clowes, who introduced me to AI vision research in 1969.
School of Computer Science
University of Birmingham
NOTE: This is work in progress
This document has been through several major reorganisations,
which may have led to internal inconsistencies and poor formatting, to be fixed later.
JUMP TO CONTENTS
My thanks to Dima Damen http://www.cs.bris.ac.uk/~damen/, for the invitation to talk about vision at Bristol University, 2nd Oct 2015, which launched this document, and Aviv Keren, for useful comments on earlier versions: https://www.researchgate.net/profile/Aviv_Keren
Also colleagues, students, and friends over many years, especially Max Clowes (1933-1981) who introduced me to AI and a new way of thinking about vision. See Clowes Tribute.
A partial index of discussion notes is in
8 Nov 2015; Updated 28 Dec 2015; 18 Jan 2016; 3 Feb 2016; 29 Nov 2016; 16 Oct 2017]
This was originally intended to be a much simpler document focused mainly on lessons to be learnt from our ability to perceive, think about, and reason about not only what currently exists or is happening in the environment but also what could, could not, or must, exist or happen. These can be described as modal features of the environment, in contrast with categorical features, that are restricted to what is the case. The ability to perceive and reason about modal features can be described as modal competences.
My original interest in this topic was sparked by an attempt to clarify, defend and extend Immanuel Kant's claims in Kant (1781), about the nature of mathematical discovery. This required, among other things, analysing requirements for perceptual systems, especially visual perception in animals and future intelligent machines. However, when I began this work, for my DPhil thesis(1962) (available digitised since 2016) I knew nothing about computers or AI.
But the nature of the subject matter, including the variety of examples that kept turning up, forced the document to grow considerably beyond the first draft (written October 2015).
These perceptual abilities extend James Gibson's ideas in Gibson (1979) about the function of perception as being primarily to provide organisms with information about positive affordances -- that is, information about possible actions they are able to perform in their current situation that could be relevant to their needs or interests Gibson (1979) and things that they cannot do or would not be useful negative affordances. This can be seen as subset of the phenomena Immanuel Kant thought about in connection with the nature of mathematical knowledge. (Some of the connections between Kant and Gibson were pointed out in [Mace,2005], but not the connection with mathematical discovery.)
Gibson seems not to have noticed that the perceived affordances he discussed are a subset of a broader collection of modal perceptual competences. An example is the ability to identify possible and impossible structures and processes in the environment, including necessary consequences that need have nothing to do with the perceiver's needs or interests. A necessary consequence of realising the possibility of adding exactly three new buttons to a collection of exactly five buttons, without changing anything else, would be production of a collection of eight buttons.
Other examples include abilities to discover necessary truths in topology and geometry, for example that containment is transitive. Examples of such necessary connections can be relevant to a perceiver's actions but not all need be, for example, if an event happens on Mars and Mars is part of the solar system then the event happens within the solar system.
More generally, in humans (and possibly several other intelligent types of animal) the functions of vision include perception of modalities (i.e. what is possible, impossible, or necessarily the case). This has nothing to do with discovering probabilities or combining sensory modalities (touch, sound, sight, etc.), though it can use any or all of those sensory modalities.
These discoveries can be about exosomatic information, for example discoveries about what is or is not possible in the environment -- i.e. outside the skin. That contrasts with learning sensory-motor and other somatic relationships (correlations inside the organism's skin). (Evolution made use of many implicit mathematical discoveries long before there were human mathematicians, but that's another sub-topic.)
I first attempted to explicate the modal concepts used here in chapter 7 of my
1962 DPhil thesis defending Kant's philosophy
of mathematics against attacks by philosophers who had no personal experience of
discovering and proving geometrical truths. A digitised, searchable, version of
the thesis was made freely available online in 2016:
http://www.cs.bham.ac.uk/research/projects/cogaff/sloman-1962/) (HTML and PDF)
The claims about human abilities to perceive possibilities, impossibilities and necessities are illustrated below using modified versions of a picture drawn by Oscar Reutersvard in 1934, as a key example. However, many other examples are presented. In particular I'll offer examples related to proto-mathematical discoveries made by pre-verbal human toddlers presented and discussed in this (also very messy) document on "Toddler theorems": http://www.cs.bham.ac.uk/research/projects/cogaff/misc/toddler-theorems.html
Some of the material in this document is re-visited in the context of my long
term attempts to understand the kinds of reasoning
required and the evolved biological mechanisms that
made the reasoning possible. In 2017 I started trying to spell out requirements
for what I've temporarily labelled a "super-Turing membrane machine" able to
reason about possible and impossible deformations of shapes (e.g. triangles) and
the consequences, discussed explicitly in this draft document:
and implicitly in:
Expanded abstract for PTAI Conference Nov 2017
Some readers may wish to skip the preliminary remarks and jump straight to examples, e.g. the section on Representing possible processes.
An introduction to the Turing-inspired Meta-Morphogenesis project can be found here:
It includes a large, growing and messy collection of draft papers on evolution of biological information processing mechanisms, partly inspired by the work of Alan Turing. Recurring themes in this work include the role of implicit mathematical discoveries made by biological evolution (natural selection as a "Blind Theorem-Prover") Around November 2014 the project began to emphasise 'construction kits' of many sorts, including the fundamental construction kit (FCK) provided by physics and chemistry, and increasingly complex and more specialised derived construction kits (DCKs).
However, there are important functions of vision that are not included in these obvious functions. Vision also provides information about possibilities, impossibilities, and necessary consequences. That can also include acquiring conditional information, e.g. about what WOULD HAVE BEEN the case if ..., or what WOULD BE or WILL BE the case if... . E.g. perceiving an apple hanging in a tree as supported by its stalk provides information about what would happen if the stalk were to break. As in the previous cases the information available in some cases is incomplete or unreliable or in some other way less than perfect. What is derived is then thought to require inferences about probabilities of various alternatives. However this paper is not concerned with those cases, but cases where a change will make something possible, or impossible, or necessarily the case. E.g. it is possible for me to move closer to an open doorway to another room, and if I do I shall necessarily see more of the room if information travels in straight lines.
These un-noticed or inadequately understood functions of vision are concerned with obtaining information about what is POSSIBLE or IMPOSSIBLE, or NECESSARILY or CONTINGENTLY the case in the environment. (What is contingent is possibly true and possibly false and neither necessarily true nor necessarily false.) For more on these "alethic" modal concepts see https://en.wikipedia.org/wiki/Modal_logic#Alethic_logic.
The visual functions described in terms of modal concepts of possibility, impossibility, necessity and contingency, have nothing to do with PROBABILITIES (although probabilities presuppose possibilities). In particular, the concepts "possible", "impossible", and "necessarily true" are totally different from notions of a non-zero, zero, or 100% probability.
The functions of vision related to perception of possibilities and impossibilities (constraints on possibilities) seem rarely to be noticed by vision researchers, although some researchers interested in perception have investigated at least some of them, including Immanuel Kant Kant (1781) and more recently James Gibson Gibson (1979)) and his followers. But Gibson and most psychologists, unlike Kant, typically fail to address relationships between mathematical competences and these visual competences, the main topic of this document. Piaget was an exception, especially in his last book, and his 1952 book e.g. on gradual development of understanding of 1-1 correlations and cardinality.
Probability concepts presuppose concepts of possibility, since probabilities are comparisons among sets of possibilities. These are often partial orderings, sometimes with numerical comparisons added. However, the (alethic) modal concepts of possibility, impossibility, and necessity used here do not presuppose probabilities. In particular, they are totally different from numerical probability concepts.
There are deep unanswered questions about whether and how the alethic modal concepts, "possible", "impossible", "necessary", and "contingent" (= neither impossible nor necessary) are used by other animals, and about how they can be represented in information-processing systems (e.g. in minds of animals or robots). The roles these concepts play in intelligence tend to be mis-described, or ignored by perception researchers, especially their connection with mathematical knowledge.
I shall use a variety of examples to illustrate some of the ways these modal concepts work, why they are important for intelligent animals or machines, how the functions of vision (and more generally perception) involve them, and how they are connected with mathematical discoveries.
A feature of the analysis presented here is rejection of "Possible worlds semantics" for the modal concepts relevant to intelligent agents (including non-human intelligent agents, such as squirrels, and early humans who made ancient mathematical discoveries). For background information on possible world semantics see, for example,
The modal concepts used here are based on the analysis of Kant's intentions in Sloman(1962).
One of the important functions of vision is to obtain information about how the environment relates to abilities, risks, needs, or intentions of other agents -- i.e. "vicarious" affordances. Gibson ((1966) and (1979)) discussed some special cases of this, but I don't think he saw all the important implications or pre-requisites of being able to see what is relevant to the desires, intentions, preferences, beliefs, of other agents, or oneself. These are topics that need full discussion on another occasion, though they will be briefly mentioned below. (See also Sloman (2009a).)
We'll see that information about what is or is not possible is relevant both to the immediate practical uses of vision and also to the roles of vision, and meta-cognition, in some types of mathematical discovery. I'll try to indicate, in very crude outline, how the earliest mathematical discoveries might have been concerned with meta-theories about possibilities for action. The need for theories about possibilities arises naturally for intelligent agents choosing and acting in a structured environment. The need for meta-theories arises if those agents have the ability to detect and reflect on their own theories.
Meta-meta theories are required for reflecting on or discussing the properties of those meta-theories, and how they can be found to be true. The evolutionary changes making that possible also made possible the kinds of mathematical presentation found in Euclid's work.
Later forms of mathematics (based on formal systems developed in the last two centuries) have other functions, which will not be discussed here. So I'll ignore most mathematics based on the developments in logic, set-theory and formal systems since around 1900. The view of some mathematicians that what happened earlier was not really mathematical discovery and reasoning is just false, unless the label "mathematics" is re-defined to make it true.
There can be enormous variations between the visual capabilities and performances of individual humans, individual monkeys, individual crows, etc. For the abilities to use vision to acquire modal information are not all innate: they develop under the influence of the environment. Individuals in different environments will therefore develop different visual competences. (There may also be genetic differences.)
Learning to perceive different sets of possibilities and constraints can be compared with differences between learning to read French and learning to read Chinese. In both cases there are considerable individual and cultural differences. It follows that requiring experiments on vision to provide reliable repeatable data about humans will rule out many experiments that provide information about important visual capabilities, since what is true of one person may not be true of another. Yet facts about individuals, even unusual or unique individuals, are part of what science needs to explain, e.g. by explaining how the individuals process information, including how and why they differ.
A good theory of vision should explain how the individual competences work (preferably demonstrated in working AI models) AND how a (mostly) common genetic heritage can produce differences in competences of individuals that share the heritage. That requires a model of individual development, some features of which are sketched in the section on Evo-Devo Issues.
My complaints about wide-spread neglect of important functions of vision apply both to theories of human vision and theories of animal vision, and to statistics-based models and theories of vision that have been successfully applied in special purpose robots and other machines with useful, but very limited functionality. The main theme here is the need for a good theory of vision to be part of a good theory explaining important types of mathematical discovery.
This point can be generalised: a good theory of mind, or of evolution of minds,
or of development of minds needs to explain the abilities of at least one sort
of mind to discover and use mathematical truths about what is and is not
possible. Such information is very different from abilities to acquire and use
information about probabilities that many current AI systems focus on. In
particular, a good theory of what minds are and how they evolved needs to
explain what made it possible for Euclid and other ancient mathematicians to
make the discoveries reported in
Euclid's Elements. Additional examples are
presented in these papers and papers they reference:
Although no detailed explanation (or working model) exists, we can discuss requirements for future candidates. Some incomplete conjectures are presented below.
This claim that seeing is "reconstructing" is often attributed to David Marr (1982), though it was taken for granted by AI researchers much earlier, e.g. Roberts, 1965 and others surveyed in Ballard and Brown, 1983, though they proposed different theories about the details.
On this view, the main functions of vision should be the same across all species, though Marr acknowledged that for some species, e.g. insect species, the functions might be different, and of course some human visual capabilities, such as reading text or musical scores, understanding sign languages, and interpreting maps and engineering drawings, are unique to humans.
In opposition to these views, James Gibson ((1966) and (1979)) criticised researchers who thought the function of vision in animals was simply to produce some sort of representation (or collection of representations) of the objects visible in the environment, including information about distances to their surfaces, orientations of visible surfaces, illumination, and a variety of other geometrical relationships.
He pointed out that there is a completely different function for perception in general, and vision in particular, which he called perception of affordances. In that sort of perception, the information acquired is not about the actual contents of the environment in a form that is independent of the perceiver's capabilities and interests, but is information relevant to potential actions of the perceiver: e.g. what actions are possible for the perceiver in the current environment, given the perceiver's physical capabilities and current needs (or goals, preferences, etc.). Actions produce changes, so the perceived information is about possible changes the perceiver can bring about. We'll generalise that below.
A more extreme version of Gibson's view, treats the information content derived from visual sensory information as heavily dependent on the viewer's anatomy and physiology, and current or possible needs, preferences, dislikes, etc.
An even more extreme theory could claim that there is no explicitly describable visual content, only an unintelligible mass of conditional causal connections between sensor neurones and motor neurones, modulated by signals from internal sensors concerned with the organism's current needs. I think some researchers who emphasise embodied cognition and who deny the use of representations, are implicitly committed to such an extreme position. But that view will be ignored here. (I expect any experienced engineer will easily see its flaws.) Most of this paper presents and analyses cases of perception of what is and is not possible. Some parts discuss implications for mathematical cognition.
For more intelligent species, perception, and especially vision, can also be used for acquiring information about what is and is not possible: i.e. modal information. So theories, models, and robotic implementations of vision systems that ignore perception of and reasoning about possibilities, impossibilities and necessities are seriously impoverished. That criticism of standard theories of vision is closely related to Gibson's criticisms, but he does not go far enough in the direction proposed below (elaborating on Sloman (2009a)).
So, vision (and to a lesser extent other modes of perception) can be used not only to gain information about what is the case in the environment, but also information about possibilities, and relations between possibilities. This generalises Gibson's ideas about perception of positive and negative "affordances" for the agent. In particular possibilities and constraints on possibilities that are perceived visually need not concern actions or needs or preferences of the perceiver; the visually acquired information can go not only beyond spatial structures, and immediately useful information about possible actions for the perceiver, but can also include future possibilities, explanations of previously realised possibilities or failures, and also discovering possible or impossible events or processes that have nothing to do with the perceiver's intentions, plans, or actions.
The need to generalise Gibson's ideas
Most "Gibsonian" theories of perception (especially visual perception) that I am aware of fail to do justice to the variety of functions of vision, the variety of types of contents of visual experience, and consequently the variety of requirements for explanatory mechanisms, or mechanisms needed to give robots human-like (or even squirrel-like, crow-like, etc.) visual capabilities. This is also true of theories of intelligence or cognition that (over-)emphasise embodiment. Focusing on too few examples of what needs to be explained leads to bad theories in both science and philosophy. It can also lead to impoverished engineering.
In particular, theorists emphasising embodiment often ignore the distinction between "online" and "offline" uses of visual information, discussed below, and the more subtle division between different "offline" uses of perception of what is the case, including perception of what is and is not possible and how those possibilities and impossibilities can change if some current possibility is realised. Processes of predicting planning, designing and explaining may all use chains of alterations in what is and is not possible.
Moreover, some of the perceptually available kinds of information about possibilities and impossibilities illustrated below are essential to ancient mathematical discoveries for example in geometry and topology, recorded in Euclid's Elements, compared with logical discoveries in http://www.cs.bham.ac.uk/research/projects/cogaff/misc/ijcai-2017-cog.html
The connection between the functions of visual perception in humans and other animals, and mathematical discoveries made by Euclid and his predecessors is the main topic of this paper, but the connections involve somewhat long and tortuous links.
Later on, the discussion will focus on some visual capabilities of mathematicians: not modern mathematicians reading logical and algebraic formulae and proofs (requiring a related, but different, set of competences), but the ancient mathematicians whose discoveries I suspect led, eventually, to Euclid's Elements. This requires understanding how evolution of biological functions of human vision, including visual competences shared with other species, led to capabilities that could have supported mathematical discoveries. Some of those evolutionary changes may be recapitulated in child development, and understanding the details may be essential for high quality mathematics teaching, but that's a topic that will not be addressed in detail here. (See the sections on Toddler Topology, and Toddler Theorems below, and the epigenetic schema in the section on Evo-Devo Issues.)
In my youth it was still customary to teach geometric mathematical competences at school, but most current youngsters seem to be deprived of that privilege. Many readers will therefore unfortunately have no prior experience of some of the phenomena under discussion. Links are provided to web pages presenting various more or less elementary fragments of Euclidean geometry and a subset of topology concerned with continuous deformations in space. I shall try to present examples that are intelligible to non-mathematicians, all of whom have mathematical competences, whether recognized or not.
The work presented here implicitly presents requirements for some of the construction kits that build human visual systems. We need open minds as to whether well-known forms of computation and physical assembly suffice.
(a) Online information about affordances is used immediately in triggering new behaviours or modifying existing behaviours (e.g. blinking reflexes, swerving to avoid something, changing direction while chasing something, closing a fist around something seen to be graspable).This is not necessarily a sharp dichotomy: there may be processes/activities that use both online and offline functions of vision, sometimes in succession and sometimes in combination. However, in the extreme cases the types of information-processing mechanism required are very different, even if intermediate cases arise from use of both types of mechanism in combined tasks.
(b) Offline information about affordances is used in considering possibilities, comparing possibilities, understanding relationships between possibilities, selecting possibilities to be achieved at some later time, or deciding between alternative possibilities that could explain past events or states. More sophisticated cases involve use of information about impossibilities. (Examples are given below, and in Sloman(2007-2014) )
I suspect some of the enthusiasm for "embodied cognition" and "extended mind" theories is based partly on recognition of the importance of online intelligence coupled with blindness concerning offline intelligence, and partly on ill-founded anti-computational prejudices in some cases. But I shall not pursue those points here.
Note on the online/offline distinction
I have recently learnt that other writers use the online/offline distinction in partly related ways. (I may have picked it up from one of them.)
I think I first encountered the phrase "online intelligence" in a talk by Karen Adolph in 2007. But the online/offline distinction is closely related to the distinction between "reactive" and "deliberative" sub-systems familiar in AI long before that, and much used in the CogAff Project:
In The Computer Revolution in Philosophy (1978), Chapter 6 used the labels "executive" and "deliberative" for a related distinction:
Sloman(1983) makes closely related distinctions using different terminology, e.g. comparing the use of vision to control painting the edge of a table with more descriptive uses focused on by AI researchers.
The distinctions were elaborated following a discussion with Dean Petters, in
Many of the details are ignored here, though they should all be seen as part of a larger investigation linking modes of representation, types of perception, modes of reasoning, and modes of learning and discovery.
Note on the irrelevance of "possible world" semantics
There are many philosophers who have worked on an idea (with a long history, but sharpened in the last quarter century or so by philosophers like David Lewis and Saul Kripke, among many others), namely the notion that our ideas of possibility and necessity depend on a prior idea of a set of possible worlds. (It is not usually expressed so baldly.) I think that analysis is completely misguided, and that the ideas of Gibson about the possibilities for change in particular contexts considered by intelligent agents (including young children, and other intelligent animals) point to a deeper, more 'local', basis for modal concepts, allowing simpler versions to be used by other intelligent species and pre-verbal children.
Instead of possible whole worlds we use possible alternative fragments of the world, usually restricted to an accessible part of space time, though one aspect of cognitive development is increasing ability to consider larger extensions, in space and time (past and present).
Ultimately this will relate to the combinatorial powers supported by physics (including the structure of space-time) and chemistry. But that is a topic for another discussion. Some of the ideas were presented in my DPhil thesis in 1962, and in Sloman, (1996), which introduced the idea of physical objects or mechanisms being "possibility transducers". (E.g. possible voltages applied to a fixed resistor are associated with possible currents:
[[Add note on how this connects with John Barnden's ATT-META mechanism.]]
(a) Online use of visual information requires fast-acting information stores (memory mechanisms) whose contents constantly influence forms of behaviour, and which are constantly overwritten as new information comes in, so that any use of the information has to be fast.
In many (most?) cases the mechanisms using such information are fast-acting (i.e reflexes) and either innate or produced by extended learning or training, e.g. in many sporting activities, musical competences, linguistic competences, and others. Some may use evolutionarily very old mechanisms (e.g. blinking), others newer, more sophisticated, mechanisms (e.g. musical sight-reading).
(b) Offline use of visual information requires longer term forms of storage, so that information acquired at a particular time can be used at different times, for multiple purposes, usually in combination with other forms of information, new and old, often on the basis of temporarily assembled structures -- using what are often referred to as "deliberative" mechanisms, discussed in more detail in Sloman (delib).
Among some psychologists, neuroscientists and even philosophers, a failure to understand this distinction has led to deep muddles about "What" vs "Where" visual processing pathways in brains. Both online and offline visual processing can include identification/categorisation mechanisms ("what") and inferences about location ("where"). And each of those two can use the other. I've never understood how anyone took the What/Where idea seriously.
See Sloman (1982).
Another common muddle seems to involve the assumption that online uses of visual information to initiate or control action are somehow incompatible with the use of the same information to provide content for visual consciousness, so that the process cannot be reflected on, talked about, evaluated, etc. ([REFS needed -- E.g. Milner and Goodale ????]). This assumption both underrates the sophistication of some of the engineering designs produced by biological evolution and also underrates what might one day be achieved by robot designers -- if it has not already been achieved in robot visual learning mechanisms, that use repeated trial and error learning to "re-shape" control algorithms.
In the case of many human online skills, e.g. in athletics, playing a musical instrument, painting pictures, and many craft skills, apprentices depend on the ability of experts not only to perform skilfully in reactive mode but also to be aware of what's going on and use that information to help learners.
I don't know of any AI robot that can learn and teach in this way, but in simple cases it should be feasible soon.
The ability to do use information offline, in forming and executing multi-step plans is often thought to be restricted to a small subset of vertebrates, but there is evidence of such abilities in other species, including the Portia Spider.
Planning and deliberation by portia spiders
The portia spider works out a route to its prey then follows it even when it can no longer see the prey, making detours if necessary and avoiding branches that would not lead to the prey.
"By visual inspection, they can select, before setting out, which detour routes do and do not lead to prey, and successfully perform a detour with no further visual contact with the prey".
M. Tarsitano, 2006,Route selection by a jumping spider (Portia labiata) during the locomotory phase of a detour,
Animal Behaviour, 72, Issue 6, pp. 1437--1442,
Another kind of offline use involves passing information to another agent: e.g. pointing at where the fruit is, or telling someone where it is, or explaining how to get to it. (These are three different cases.) It is easy to think of other cases of offline use of perceptual information: left as an exercise for readers.
Gibson's idea that the main function of vision is to provide information about affordances, can be further generalised to include the role of vision not only in acquiring information about possible actions of the perceiver (used in either online or offline intelligence), but also information about possible changes in the environment, and constraints on those changes, irrespective of whether the changes are produced by the perceiver, and irrespective of whether the changes are known to be relevant to the current or future needs or interests of the perceiver. An example would be noticing the possibility of the fruit falling and hitting a branch below it. Every physical configuration of objects has multiple possibilities of change that can be understood by perceivers who have no interest in whether the changes occur or not. I call those "proto-affordances". See (e.g. Hartson (2003), Sloman (2008) and Siegel (2014)). Examples of scenes with multiple proto-affordances are presented below.
A quite different approach to representing possibilities is to switch to topological, or more generally relational and structural descriptions. Such descriptions can specify parts, and relationships between parts, possibly parametrised relationships; e.g. A and B meet at an angle that is smaller than the angle between C and D, or the distance between A and B is less than the distance between C and D, or A, B, C, and D are parallel with gaps of increasing size along the sequence, or the vertex at which A and B meet lies on C, and many more.
The availability of such non-numerical representations of structure can make it inappropriate (and wasteful) to use the most powerful mathematical methods for representing processes, e.g. using differential and integral calculus. It can be especially wasteful and inappropriate if the precision of those methods is overkill for the information needs of an animal (e.g. answering "Am I getting closer to my prey?"). Use of qualitative, topological, comparative, imprecise descriptions may allow far greater generality at the cost of some ad-hocery: e.g. learning about special important cases (structures) and describing others in terms of them.
The use of grammars and parse trees in linguistics and in compiler design illustrates the power and versatility of non-numerical forms of representation: for some purposes. In the 1960s, Clowes and others proposed similar techniques for visual systems, including claiming that, like sentences, pictures/images could have (two-dimensional) syntactic structures representing (in many cases three-dimensional) contents, though finding appropriate generalisations of the notions of "grammar" and "semantic content" was not easy Kaneff(1970).
Chapter 9 of Sloman (1978) provided a demonstration of multi-layer semantics, as a proof of principle, showing how images can often be interpreted as having several distinct levels of semantic content, detected in messy images by the POPEYE program. (At that time AI theories of vision as requiring a mixture of bottom up and top down -- and middle out-- concurrent processing were unfashionable. An application for funds to continue the research was refused, and a paper reporting the work proved unpublishable.)
Could challenges for online intelligence lead to a new kind of offline intelligence
Is it possible that mechanisms that originally evolved to serve fast-acting behavioural reflexes, may later have been modified to serve fast-acting mechanisms for building new temporary internal information structures triggered by the contents of fast-acting sensory buffers developed for online intelligence (type (a) above)?
During speech understanding these extended online control mechanisms would construct intermediate information structures representing phonemes, morphemes, words, phrases, clauses, sentences and other linguistic entities.
But before that, evolution may have produced older visual mechanisms that rapidly constructed a variety of information structures about aspects of the environment, including parts of familiar objects, combinations of familiar objects, possible action trajectories, possible consequences of such actions, possible non-action processes, and combinations of all the above to form information structures about a complex process, such as a person eating a sandwich, the perceiver assembling ingredients for making a sandwich, locations where ingredients might be stored, and various partially assembled sandwich states of changing complexity. For some of the entities thus recorded at high speed, e.g. various objects, spaces, spatial relations, and motions, the relevance to possible future moves by the viewer may also be derived and recorded. That could be the birth of certain kinds of affordance perception discussed by Gibson.
If perception of a static scene can trigger rapid construction on varying spatial scales and temporal scales, with varying combinations of concreteness and abstractness, then perception of a complex moving scene, or a complex static scene perceived by a moving viewer will require mechanisms that can rapidly modify the information structures, driven by information about changes in receptor information contents in combination with other information, including information about the viewer's actions, and additional background knowledge about the type of environment.
If all that apparatus for motion perception is already available to deal with a wide variety of types of motion, whether motion of the viewer, or motion of perceived objects or both, then perhaps the same apparatus can also play a role in a new kind of perception of static scenes, by implicitly representing widely varying possibilities that cover things that could happen in such a situation.
If the mechanisms for abstraction are available for dealing economically with actual processes they may also allow representation and reasoning about possible processes.
These generalisations of Gibson's ideas seem to be crucial for understanding mathematical cognition in humans, other animals and possible future robots. That's because key forms of mathematical discovery are concerned with what is possible and what is impossible, and how the set of possibilities and impossibilities relevant to a situation can change if some of the possibilities are realised. Your possibilities for action and perception outside a doorway are different from the possibilities just inside the doorway.
Some of those possibilities if realised, will necessarily have certain consequences. For example if there are several coins on the table each will either have heads up or tails up. Then turning any of them over will switch to the other state. If it was heads up it will be tails up, and vice versa. But there are more subtle examples. If there are two coins then their states may be the same or different: both heads, both tails, or one heads and the other tails (two more cases).
If two turning-over processes occur: involving either or both coins, then if they initially had the same state they will end up with the same state, and if they initially had different states they will end up with different states -- after two turns. Anyone reading this document should be able to work out why that consequence must follow. (I may add an explanation later.)
To make these mathematical discoveries you don't need to collect statistical information or have any knowledge of the probabilities of various configurations or any understanding that there are ratios among sets of possibilities.
Of course, discovering such ratios can provide knowledge of probabilities, for example the probability that if coins are randomly placed on squares on a chess board, one coin on each square, the four corner coins will all be the same way up, i.e. four heads or four tails. (Even non-mathematicians may be able to work out that probability.)
Contrast Mary Pardoe's proof of the Triangle Sum Theorem, in the form: the interior angles of any planar triangle must sum to a half rotation (180°):
In a plane surface, rotating the blue arrow through the three internal angles (i.e. A, then B, then C) always brings it back to the starting line, pointing in the reverse direction, without ever crossing over its original orientation, and this (obviously?) doesn't depend on the shape of the triangle.
The above example illustrates ways in which possibilities and necessities or impossibilities can be closely related: realisation of some possibilities may necessarily have certain consequences. What they are and why they are inevitable differs from case to case. The ability to notice possibilities and impossibilities (necessities) and the consequences of realising some possibilities in a situation is an important aspect of human development, as Piaget noticed. His last two books (1981-1983) discuss many examples.
However, although Piaget realised that these are important aspects of human cognition, and some of ways of probing children's minds are based on deep insights about varieties of cognitive function, I am not sure that his theories regarding the cognitive mechanisms (which I found hard to follow) were sufficiently well developed to be useful, e.g. in explaining mathematical cognition, or in designing intelligent machines with human-like powers of mathematical discovery. Piaget's work on cardinality is mentioned below below..
One of the problems of discussing such issues is that there are so many different types of case, and we need to understand the variety in order to come up with good theories about what's going on. In particular there are some cases where the cognitive competences involved are purely logical reasoning capabilities, whereas in other cases more varied mathematical abilities are required, e.g. concerned with reasoning about spatial structures and processes, as in topology and Euclidean geometry.
Many of Escher's pictures have far more complex examples of this, e.g. his Waterfall picture (https://en.wikipedia.org/wiki/Waterfall_(M._C._Escher))
Consider possible configurations of a scene containing items in different arrangements.
Alternative configurations of the blocks are presented below. You can probably imagine a series of individual block-trajectories that would transform the above figure into the one below, and similarly for (most of) the later examples.
Two more possible configurations of eight blocks.
However such programs did not have the ability to suggest, or reason about, alternative configurations of blocks: they saw only what existed. Moreover they were not able assign precise lengths or angles in all the images in which they could perceive structure. And although they could in some cases detect that one visible surface must be further from the viewer, they did not reason about whether the whole scene depicted was consistent. So they could not detect circular "further than" relationships, though I suspect that could have been added. But that's just a special case of detecting impossibilities.
Thinking about, or imagining possible variations in a scene is a crucial ability for many intelligent animals, including nest-builders, hunters, and animals that care for their young.
Humans can do this not only for real scenes containing physical objects but also for depicted scenes: where the pictures specify physical objects in physical relationships, including relationships like adjacency, co-linearity, being above, being between, or supporting.
They presumably cannot do all this at birth. Why not? What mechanisms do they lack? How do they acquire the mechanisms that provide the new abilities later on? Is it merely a process of learning to use mechanisms they already have from birth? Or from before birth -- e.g. from month X of foetal development?), or do new brain mechanisms grow during years of physical growth (and thereafter)?
Here are three more possible configurations with 8 blocks in each. Ignore the problem of gravity for now: the blocks could be held in those locations or they might be in a location with zero gravity or the "suspended" blocks could be lowered to the surfaces below them. You can probably imagine several different sets of trajectories of individual blocks that would produce each of the three new scenes.
Here are nine blocks on a surface, shown in three possible configurations, including one in which one of the blocks is suspended above (or floats above) another block. How many other configurations are possible? How else could they be arranged? What sequences of block moves could produce the new arrangements?
Note: Piaget asked children that sort of question using a few objects on a flat surface (1981). Not all realised that there is no fixed finite set of possibilities. Some answered after a while that no more arrangements were possible.
Other things you can do in the perceived configurations include swapping pairs of blocks: move one onto the table, move another block to the newly emptied location, then move the first block into the new space. Or move both simultaneously using two hands.
Several more configurations of nine blocks are depicted below. You may or may not find one of them anomalous.
Yet more possible configurations of 9 blocks. Are they all really possible? See text for discussion.
(Inspired by Reutersvard's 1934 drawing.)
As before you can visualise ways of rearranging the blocks or moving your hand between the blocks, as described above in Section Possible moves. Look closely at the differences between the last two configurations. The left and middle pictures (J and K) depict perfectly possible 3-D configurations of cubes (though in a normal gravitational field something would be required to hold in place the cubes that are not resting on the table or on other cubes).
But there are subtle 2-D features of the rightmost picture L that indicate that the 3-D configuration that it represents, if interpreted as a picture of 9 cubes, involves a collection of pair-wise relationships between the cubes that are all possible in isolation, but not all possible in the same 3-D configuration. This impossibility does not arise out of any mis-use of pictorial conventions. The image uses only examples of image fragments that occur in other pictures of configurations that are perfectly possible.
The fact that the scene is experienced as impossible only if all the blocks are included challenges theories about limitations of numbers of objects that can be attended to simultaneously.
Examining the image L you should be able to imagine ways of removing one block that would leave the object depicted impossible, and also ways of removing other individual blocks that would render the scene perfectly possible.
When a 3-D scene depicted is geometrically impossible there need be nothing impossible about the configuration of lines in the picture. The impossibility concerns which 3-D structure, if any, the picture depicts if all the parts are interpreted normally as depictions of 3-D structures and relationships.
Another view of the transition from part of Figure J to Figure L above. The image above left depicts a possible 3-D scene. Modifying it as on the right produces a picture that, if interpreted using the same semantic principles, represents an impossible 3-D scene, where blocks A, B, C form a horizontal line, blocks F, G, H form a vertical line, D and E are between and on the same level as C and F, and the new block X is co-linear with A, B, and C, and also with F, G, and H -- impossibly!. Notice how the relationship between A and H has changed.
The drawing on the right (minus labels) was by Swedish artist, Oscar Reutersvard, in 1934
Compare the above two pictures. A complex picture made of parts representing possible 3-D configurations may have N parts such that if a certain part X is added (e.g. a picture of an extra block that is simultaneously co-linear with two other linear groups, as in the above figure on the right), then it becomes anomalous and cannot represent a 3-D configuration using the same rules of interpretation (based roughly on reversing projections from 3-D to 2-D). Notice that in this case, the addition of X required changes to the (2-D) depictions of blocks A and H that preserved the 3-D relationships between A and B, and between G and H, but altered the 3-D relationships A and H, depicting A as occluding H. That produces a contradiction even if block X is not depicted. If X were removed, more of H would be visible, but the impossibility would remain.
In other words the original N parts have a joint interpretation that entails that the situation depicted by adding the part X cannot exist, but not because of the addition of the new block, but because of a subtle change in the relationships between pre-existing blocks. If the blocks were depicted spread out more in space, so that they are not overlapping, this change would not be necessary, but various relationships would become more ambiguous.
This is partly analogous to logical reasoning where N consistent propositions entail that an additional proposition X is false. So its negation can be inferred to be true.
This picture cannot be handled by the Huffman-Clowes line-labelling mechanism described in Clowes(1971) as it requires a richer grasp of geometry than the line-labelling provides (3-D relationships between adjoining or connected portions of an interpreted image). It requires an ontology of opaque 3-D objects with relationships between whole objects, not merely an ontology of edges, vertices and faces, with 2-D and 3-D relationships between them.
Humans can reason that the configuration on the right is impossible without knowing any of the actual distances or sizes, whereas I don't believe any current AI vision system can do that, though it may not be very difficult to implement to deal with the special case of opaque rectangular blocks in static scenes.
The detailed requirements for the richer ontology, if extended beyond 3-D objects bounded by plane surfaces, and beyond rigid objects (e.g. to include objects made of different "kinds of stuff"), and beyond static configurations, will vary for different species of animal, and for different developmental stages in the same species. As far as I know very little of this is in any current AI systems (or psychology, or neuroscience).
I see no reason to believe these capabilities could be acquired by any of the forms of learning currently fashionable in AI/Robotics. Much deeper epigenetic mechanisms are required e.g. as speculated in connection with Figure Evo-Devo, below.
This requires researchers themselves to develop deeper (meta-cognitive) ideas about forms of geometrical and topological perception and reasoning.
Moreover I don't think neuroscientists have any idea how brains can support this kind of reasoning.
The above example is partly comparable to a collection of sentences, each of
which describes a perfectly possible state of affairs, though their conjunction
does not, e.g. "Tom is older than Dick", "Dick is older than Harry" and "Harry
is older than Tom".
Older than is a transitive relation, which means that
"X is older than Y" and "Y is older than Z" implies "X is older than Z"
So the first two conjuncts above imply "Tom is older than Harry" but that contradicts the the third one because it is not possible for Tom to be older than Harry while Harry is older than Tom. ("Older than" is an anti-symmetric relation.) Why it is impossible, and how it is possible for an individual (human, other animal, or intelligent machine) to know that a relation is transitive and anti-symmetric, will not be discussed here.
The situations depicted in the pictures of blocks are more complicated than the linguistic example because there are several different relationships, including "further from the viewer", "higher than" or (further above the surface of the table), and "further along" in various directions in the scene, all of which are transitive and antisymmetric relations. It is left as an exercise for the reader to work out which 3-D relationships between blocks or between groups of blocks are depicted in the various pictures, and which combinations are inconsistent.
Aviv Keren drew my attention to a closely related paper by Roger Penrose (1992), in which the impossibility of a Penrose triangle is related to the mathematical concept of a "cohomology group". The ideas are introduced in terms of ratios of distances of objects along a viewing direction, which presupposes that distances have a metric (though I have not yet understood all the mathematical details of the paper). I have tried to show how familiar qualitative relationships of the form "further in direction D" that are transitive and antisymmetric can suffice to explain the perceived impossibility, without attributing to the viewer an understanding of coordinate geometry, or use of a metric for distance. (I suspect that the discussion by Penrose uses a special case of this.)
There are strong pictorial clues for partial occlusion. For example, a pictorial "T" junction is often used to indicate that a partly visible edge where two surfaces meet, represented by the stem of the "T", is occluded by another surface whose edge forms the crossbar of the "T" (a clue used since the 1960s by AI vision researchers).
Using transitivity and antisymmetry of "further" is easier in connection with the Reutersvard triangle than with the Penrose triangle (referred to as a "tribar" in his paper). In fact the Reutersvard scene includes not only violation of antisymmetry of a collection of spatial relations, but also includes a large collection of affordances (in the sense of Gibson discussed and illustrated above) concerning possible moves of the blocks and possible moves of other objects (e.g. a flat hand) in the spaces between the blocks that form an impossible collection. Details are left as a further exercise for the reader. (Or a future AI program demonstrating its spatial understanding!)
The pictures of impossible objects and sentences describing impossibilities both illustrate a deep and important point: if a form of representation (pictorial or linguistic) is to be suitable for expressing information about a rich domain of possible structures and processes, there may be no syntactic constraint on pictorial or verbal modes of composition, that is guaranteed to prevent self-contradiction AND provides the desired expressive power.
Conjecture: There is no drawing convention that allows all the possible configurations of nine blocks to be depicted, and which does not also allow the depiction of impossible scenes, like those shown above.
Compare: there is no generally useful human language that allows everything to be expressed that we might wish to express but prevents description of impossible configurations. Even the language of arithmetic allows us to formulate propositions that cannot be true, e.g.
3 + 5 < 6Some mathematicians and programming language designers have attempted to design syntactic rules for languages that guarantee the impossibility of expressing something that is impossible. I believe that cannot be done for natural languages without intolerable restrictions in usefulness. Sloman(1971b) I believe the same can be said regarding languages for use in AI projects aiming to replicate human intelligence.
The ability to notice impossibilities is an aspect of intelligence. For example, an animal thinking about possible ways to move a physical object to get it from one location to another location should be able to detect, at least in some cases, that moving the object through a particular gap is impossible, because the object is too wide. The ability of a 2-D structure to depict an impossible 3-D structure, and the ability of (some) humans to detect the impossibility have been much studied. But I think many of the examples have subtle clues about functions and mechanisms of vision that have not been appreciated. This is particularly evident in the 1934 picture produced by Reutersvard, presented below.
In the above examples, I have tried to show how to build up to Reutersvard's example gradually, in order to get an accurate account of the phenomenon, by locating the discovery of impossibility in a space of possible actions using vision along with a visualisation of a target configuration.
Although James Gibson did not, as far as I know, ever discuss such examples (or the others mentioned below), I hope it is clear that the discovery of such impossibilities could arise in the context of perceiving affordances in the environment, i.e. possibilities for change of spatial relationships, and using those perceived affordances to construct something. The formation of an affordance-based intention generally leads to either a plan or an exploratory process that eventually culminates in construction of the intended spatial configuration. But in special cases it is possible to discover that the intention cannot be fulfilled because of negative, obstructive, affordances, which in some cases cannot be overcome using greater strength, new materials, collaboration with helpers, etc. I suspect that such discoveries could, over hundreds or thousands of years, have led our ancestors to formulate mathematical theories about spatial structures including discovering proofs of the sort presented by Euclid (Elements).
But even if these historical conjectures are correct, that still leaves us with the problem of explaining how the impossibilities are understood: what sorts of cognitive mechanisms allow proofs of impossibility to be constructed? What forms did those proofs take? How are they related to the processes in the mind of a child, or a squirrel, or a crow, who not merely tries and fails in a task, but comes to understand the failure? At present I don't believe there is anything in psychology, philosophy, neuroscience, or AI that provides a rigorous explanation. Part of my reason for collecting a large and varied set of examples is to build up requirements for explanatory mechanisms. Combined with evolutionary investigations and exploration of designs for intelligent robots, we may be able to come up with a good theory that can be implemented and tested. I don't think anyone has such a theory at present.
While a student, he drew the star at the centre, then added more lines, ending up with his impossible configuration, several years before the pictures of Penrose and Escher.
An important feature of this picture is the number and variety of possibilities for change implicitly depicted: e.g. all the places where you could put your hand between two of the cubes, all the cubes that could be removed leaving a gap, producing a possible 3-D configuration, all the pairs of cubes that could be swapped, etc. So we have a very rich collection of imagined structures, relationships and possibilities for change, including cases where what is imagined is geometrically impossible. Compare the following description of a collection of numbers.
There are nine numbers, a, b, c, d, e, f, g, h, i, all positive.
What conclusions can you draw from the first three equations? Could the three equations and the inequality all be true? How do you know?a + b = c
d + e = f
c + f + h = i
i < a
Reutersvard went on to produce many variants of his idea and a selection of his
pictures were used in Swedish stamps. More pictures by Reutersvard are available
It is arguable that Escher's pictures of impossible objects or scenes were more
subtle and creative, with their rich blends of geometric and biological forms,
but that feature is not relevant to our current discussion:
In this case not all of the impossibilities are geometrical.
This youtube video demonstrates Gregory's non-triangle:
A Youtube video showing construction of an "impossible" triangle using dice.
Watch the sleight of hand just after 1min9secs. Compare this version:
What brain mechanisms support the information processing that allows you to generate interpretations of fragments of the image, and also to detect and rule out combinations of fragments that represent impossible configurations? Do you have to be trained on many examples of tangled arms, legs and bodies? Or can generic spatial reasoning abilities be combined with structural knowledge about human bodies.
I suggest that the mechanisms involved in rejecting wrong interpretations of ambiguous views of spatial structures in novel scenes are the same sorts of mechanism as were used by ancient mathematicians, in making geometrical and topological discoveries. At this stage I suspect nobody knows how biological brains can do that sort of thing. I'll turn now to a set of different, though partly related, examples.
A slightly harder question: what cognitive capabilities would a child, or a robot require in order to answer the question.
An even harder question: what kinds of evolutionary transitions might have produced the capabilities required for this task? (Later we shall contrast physically constrained and rule-constrained versions of the same problem.) What sorts of naturally occurring situations might have this sort of structure? What kinds of brain mechanisms might have been required in our earliest ancestors capable of answering this sort of question?
Some brain mechanisms would allow the question to be answered only for small groups of tiles, e.g. two or three of each colour. Do your brain mechanisms have such a limit, and if not why not?
Could a robot acquire the required abilities by being trained on lots of examples? If not, how could it be given those abilities -- i.e. what sort of artificial brain could deal with such problems, without being restricted to particular sizes of grid and particular numbers or configurations of the red and grey tiles?
The next example is partly similar, and partly different. Similar questions can be asked about brain mechanisms required, and their evolutionary origins.
On the left grid (a) the coloured squares can slide horizontally or vertically but not diagonally.Some readers will recognize this as mathematically related to a very well known puzzle that at first looks totally unrelated. If you don't recognize it follow [*]this link. (There may be better explanations on the internet.)
On the right grid (b) the coloured rectangles can be rotated, and can slide between squares only in diagonal directions, not horizontally or vertically.
Subject to those constraints can the three grey items slide to the squares containing the three red items in each case (or vice versa)?
If not why not?
What kind of information processing system can allow an animal or robot to think about what transformations between configurations are and are not possible?
How is this related to mathematical discoveries in geometry, topology and arithmetic?
Do the shapes of the objects matter to the solution?
How is this related to perception of affordances in everyday life?
By what means could those discovery abilities possibly have evolved? What sorts of transitions in functions of vision, or more generally functions of perception, might have led up to such competences?
Can these mechanisms and processes be implemented using current tools and ideas for programming intelligent robots?
Clearly newborn human infants cannot answer these questions about objects sliding around on grids, or similar questions using different spatial configurations. Why not? In what ways would their brains have to change, or be changed, in order to enable them to think about such problems and not only work out the answers but understand that the answers are necessarily correct, and could not be different for grids and tiles made of different materials, or located at different altitudes above sea level or even on another planet?
How might Gibson's theory of perception of affordances have to be revised to cope with these questions?
While thinking about the above problems did you consider the possibility of altering the grid of squares so that they have two colours, like a chess board, with diagonally adjacent squares the same colour and horizontally or vertically adjacent squares different colours. That transformation makes the answers to the second set of questions, with two types of grid, trivial to answer. See this link[*] (same link as above).
What kind of brain mechanism in a human or a robot allows that sort of solution to be discovered and used in a proof? Curiously, many highly intelligent humans who already have the required brain mechanisms and knowledge of chess boards don't notice the relevance -- perhaps because they have too much potentially relevant knowledge.
For example, in my experience some expert mathematicians immediately notice that assigning a coordinate frame to the grid gives each square two coordinates and their sum (or difference) is either odd or even. So they try to find answers in terms of parity-preserving operations, using their knowledge of arithmetic and algebra. This leads to a mathematically acceptable solution, but non-mathematicians who merely notice the consequences of having two colours alternating horizontal and vertically, as on a chess board, can also find a mathematically acceptable proof without mentioning coordinates of squares or division by 2, etc. They are simply using a different sort of mathematics, closer to the reasoning in Euclid's Elements
What sort of brain mechanism is required to enable a person presented with a problem to notice the relevance of apparently unrelated knowledge? [*]
The Mutilated chessboard problem
Two relevant books:
George Polya, How To Solve It,
Princeton University Press, 1945
Max Wertheimer, Productive thinking
Harper. New York, NY: 1945.
Note that unlike the previous examples, in Case (C) we can't draw the impossible configuration under discussion, namely a regular NxM array made up of exactly seven blocks. Case (C) would be dealt with easily by someone who has already learnt about prime numbers, and understands the unique factorisation theorem. But perhaps playing with the cube rearrangement task could lead a bright child to notice the impossibility, and eventually prove that the problem is not one of a failure to explore enough configurations, even without previously having learnt about prime numbers. Exercise for the reader: what would have to go on in the child's mind for this to happen?
One of the features of mathematics is the variety of interconnections between different problems. Sometimes essentially the same mathematical problem is discovered in quite different contexts. I think even human toddlers and intelligent non-human animals can make such discoveries and use them, but without being aware that they have done so, and consequently not being able to ask or even think about the question: "How do I know that no exception will be discovered on a high mountain, or at a freezing temperature, or while travelling in space?" They are incapable of noticing the epistemological features of the mathematical discoveries they use. And if they grow up to be philosophers with no understanding of computation, they may misdescribe what they have learnt.
A child given a set of wooden cube-shaped blocks can do all sorts of experiments -- exploring the space of processes involving the blocks.
A child may notice that in certain cases, attempts to rearrange a configuration into a rectangle always fail: What kind of experimentation can that provoke, and what sorts of discoveries can be made?
How could one be sure that there is NO way of arranging the last collection into a rectangular array, apart from the straight line shown? Could a child playing with such blocks (or discs, or other movable objects) discover the concept of a prime number? I suspect it could be done using an ability to "carve up" a spatial region in a systematic way and then confirm by exhaustive analysis that no possible distribution of the blocks produces a rectangle, other than the co-linear arrangement.
When I discussed this hypothetical example (discovering theorems about factorisation and prime numbers by playing with blocks) with some people at a conference, one of them told me he had once encountered a conference receptionist who liked to keep all the unclaimed name cards in a rectangular array. However she had discovered that sometimes she could not do it, which she found frustrating. She had unwittingly discovered empirically that some numbers are prime, though apparently she had not worked out any mathematical implications.
Could the child rearranging blocks discover and articulate the fundamental theorem of arithmetic? (The unique factorization theorem.)
However the physical world leaves open the possibility of lifting the tiles and moving them to new squares, a process that would be unconstrained in both varieties of the puzzle presented earlier. Someone wishing to achieve the specified end state could therefore ignore the constraints and make more direct moves.
But for some reason there could be a preference for moving the tiles along the constrained paths, requiring routes to be found, when possible. In that case, the preference could arise even if there were no physical constraints: one can "playfully" explore what is possible (a) when only vertical and horizontal moves are considered and (b) when only diagonal moves are considered. The same sort of mathematical reasoning would be relevant both to the situation with physical constraints and to the situation with non-physical constraints, only freely adopted rules.
One consequence of this is that an engineer who discovers "in the abstract" that certain sorts of constrained moves would have useful consequences in some situation can then construct mechanisms in which physical structures impose those constraints, so that all the permitted changes in those physical structures necessarily have the desired properties. Examples include designing channels for flow of water or other liquids, designing grooves along which balls can roll, designing rails to control motion of trucks, designing linkages (e.g. to produce bi-stable car boot (trunk)) lids, designing gears that control the relative speeds of rotation of two axles (and things attached to them), and many more. (The use of tools, a focus of much research in psychology, is a special case of this phenomenon: tools are aids to controlled matter manipulation. Some of them transform forces in addition to constraining motion, e.g. levers, gear-wheels, screw-drivers, pincers, etc.)
The fact that mathematical investigations can be addressed in contexts where structural constraints are "freely adopted" could lead (and I think has led) some philosophers to the mistaken conclusion that mathematics is a human creation, and contains only freely created constructs, with no absolute necessities. Wittgenstein famously wrote: "For mathematics is after all an anthropological phenomenon" (in Remarks on the Foundations of Mathematics). But clearly the consequences of freely adopting a precisely defined constraint are not themselves freely adopted, any more than the consequences of a strong physical constraint with the same structure are freely adopted.
Moreover, it is not only humans who discover and use mathematical structures and their properties: other intelligent animals do also. E.g. weaver birds make use of mathematical properties of knots. Moreover evolution (natural selection) has discovered and made use of many mathematical facts, e.g. that certain sorts of control systems (using negative feedback to achieve homeostasis) will produce stable temperatures or pressures or orientations. Many more subtle mathematical facts must have been used to allow control systems in developing brains to control various kinds of motion by changing their details (their parameters) during growth of an organism, a process in which absolute and relative sizes of body parts change, and their weights, moments of inertia, strength and other features change, but without requiring growth of new brains (or sub-brains) to control the physical configurations with all their new properties and relationships.
Such control mechanisms can be thought of as using primitive "grammars" for processes. This may be relevant to unanswered questions about evolution of languages for internal use and for communication. Sloman[Vis-Lang].
This discussion raises interesting biological questions. What differences are there between brains that can solve problems involving satisfying constraints only when the constraints are externally imposed, e.g. by physical structures, and brains that can also solve similar constraint satisfaction problems where the constraints are not freely chosen, but adopted for some practical reason, e.g. using rules as constraints in reasoning about a diagram representing a physical structure with corresponding physical constraints, like an architect deriving consequences of some design decisions by reasoning with architectural drawings.
This is part of the evidence that evolution can be usefully considered to be a "blind mathematician", a view discussed in several of the Meta-Morphogenesis project papers.
In the context of the discussion of evolved construction-kits in a separate document construction-kits.html a distinction is made between physical, abstract and hybrid construction kits. A game in which players are constrained by physical structures is a kind of physical construction kit. If some of the physical constraints are removed, like the constraints restricting the trajectories of tiles on a surface, but the effects of the constraints are adopted as constraints on solutions to problems (i.e. rules restricting possible actions), then the result is a "hybrid" construction kit. Many games played by humans, such as soccer, cricket, tennis, and others are based on such hybrid construction kits.
In some cases, such as chess, or draughts (checkers) or GO it is even possible to play the game without making any use of physical pieces or a board, by simulating their effects. Such a game is then a purely abstract construction kit. A great deal of mathematics is concerned with investigation of properties of such abstract construction kits. But many of the original mathematical discoveries were based on concrete versions of those construction kits, where constraints (or rules) came from physical structures not from intentions of players or social agreements.
How and why and when and how many times, did evolution produce new brain mechanisms making such problem solving (and problem recognition) capabilities possible? Which were the earliest cases in evolutionary trajectories leading to modern humans? Producing such mechanisms is itself a partly mathematical problem, of finding structures and rules that support the discovery processes. Presumably the ability to solve problems arising in the use of abstract construction kits built on evolutionary transitions that solved useful practical problems, and had additional powers beyond the one powers required for the specific problems encountered.
Added 10 Nov 2016
Some of these ideas are taken a bit further in a paper (written October/November 2016) discussing the nature of mathematics and the argument that mathematical discoveries were made and used by natural selection long before there were human mathematicians.
Chair too wide to slide through doorway.
Can you see an alternative possible arrangement that allows the chair to go through the doorway (without being dismantled, folded, etc.)?
Of course the possibility of getting the chair through the door existed from the start, but it was temporarily blocked by the orientation of the chair and the width of the door. Changing the chair's orientation produced a new direct possibility.
Many action plans created by humans and other intelligent animals depend on the ability to recognize and reason about sets of possibilities and impossibilities, and ways in which they can be chained usefully.
This is a common feature of biological processes: some new possibility once realised removes previous possibilities (moving downwards) and thereby enables further new possibilities (reaching to a new height). This depends on the rigidity and strength of the material used. The ability of plants to grow upwards (e.g. to get more light) depended on evolution of mechanisms making possible the growth of materials that reduced or removed possibilities of bending, i.e. rigid materials. Without such changes to the materials constructed during development, plants could not have evolved on dry land as we know them. Giant redwood trees are an extreme example. Roses or sunflowers held up by stalks would also have been impossible.
Terrence Deacon (2011) seems to me to be confused about achievements of biological evolution because he emphasises the negative aspects (constraints) in new developments (e.g. production of rigid materials) without noticing the positive aspects (e.g. enabling new possibilities, such as supporting heavy structures). Anyone who has played with construction kits, such as Meccano, will have made use of the fact that separate parts can be locked together (e.g. forming a hinge) thereby both constraining their independent motion and enabling new possibilities, such as building a structure with a part that can move relative to another part, such as the jib of a crane. Related points about molecular level construction kits were made in What is life?, by [Schrödinger 1944].
So new developments may be both negative (constraining) and thereby also positive (enabling) at the same time. This is part of the intrinsic nature of possibility and necessity in spatial structures, relationships and processes, with rich implications in structures with non-rigid spatial relationships between rigid parts. See also Sloman (1996).
Being aware of such relationships and their implications is an important feature of (well developed) human perceptual consciousness, and apparently also consciousness of some other intelligent species.
Piaget, unlike many researchers into number-cognition, had read deeply in philosophy of mathematics (including work by Gottlob Frege and Bertrand Russell) and knew that understanding one-one correspondences is central to understanding the natural numbers (as cardinals). Consequently, whereas many researchers assume that being able to recognize and name the numerosity of small collections of objects is evidence for possession of a concept of number, Piaget realised that far more is required: in particular a grasp of features of one-one correspondences that children acquire only gradually. In some cases they are not understood fully until close to age 6 years (Piaget, 1952). [The precise ages are not important: the possible forms of partial understanding are.]
Chapter 8 of Sloman (1978) attempted to illustrate some of the algorithmic and architectural requirements of a learner developing information about number names and how to use them in various practical tasks, all of which depend on the use of one or more one-one correspondences, including correspondences between objects or events and an initial sequence of number names. Some of the implicit themes were made explicit in the Note to Chapter 8, added in 2016. This section summarises a subset of that note.
In his (1952) Piaget used a variety of experiments to probe the ability of children to recognize and make use of one-one correspondences, and their abilities to reason about those correspondences, e.g. answering questions about whether and why the correspondences are or are not preserved by various actions. Many researchers attempted to replicate, or modify his experiments, but often labelled what they were studying as something like "understanding conservation", without any theory of what made such understanding possible. A useful summary of some of this work by Saul McLeod with videos can be found here: http://www.simplypsychology.org/concrete-operational.html
Piaget famously discovered the apparently staged development of the ability to understand that it is impossible for a one-one correspondence to be destroyed by a mere re-arrangement of the objects involved. His postulated stages of development need not concern us now.
Understanding that sort of "invariance" is essential to understanding the cardinal numbers. But before children reach full understanding many of them seem to regard a rearrangement that stretches out or compresses a collection of objects as altering numerical equality between that collection and another collection. So even if they have seen and accepted the original correspondence and have also seen objects being moved to form a perceptibly longer collection, but without addition or removal of any objects (i.e. simply creating a new one-one correspondence between initial and final elements of the groups) some of the children apparently think that this rearrangement changes the one-one correspondence between the objects as they were and the objects in the new configuration. But the questions asked do not normally explicitly refer to one-one correspondence. The experimenter may ask whether there are more objects than before the rearrangement. When children mistakenly say there are more, they may be answering the wrong question, or they may not understand the invariance. Figure Transitive below, illustrates a special case of the problem.
This diagram summarises two physical examples with a common structure.
Case 1: there are three groups of objects A, B, and C (not labelled) in the figure. If elements of set A (on left) are in 1-1 correspondence with elements of set B in the middle, and elements of B are in 1-1 correspondence with elements of another set C, on the right, then the two correspondences can be "joined" to form a 1-1 correspondence between elements of A and elements of C. A child with an understanding of number will see that it is impossible for the first two correspondences to exist without the third also existing. This might be based on the visual ability to see how each link in the first correspondence can be combined with a unique link in the second correspondence to form a new link from the first to the third set.
This correspondence is not affected by the way elements of the sets are distributed in space: e.g. one set may be compact and another stretched out. Likewise for sets of events with different time-intervals between the events.
Case 2: the diagram can also represent one group of physical objects first translated from locations on the left to locations in the middle of the diagram, then translated to the locations on the right. When this happens the two transformations can be composed to form a new transformation from the left hand group to the right hand group. What a learner needs to understand is that despite the changes in appearance of the group of objects after each transformation, no new objects are added and none are removed, and there is a one-one correspondence between the initial locations and the final locations of the objects. It is impossible to destroy a one-one correspondence simply by moving objects around, if no objects are destroyed or merged or split into smaller objects.
As far as I know, Piaget was not able to explain how a child (or adult) can see in both Case 1 and Case 2, that if the first two correspondences exist the third must also exist. My impression is that many psychologists who have read or heard about this work by Piaget do not understand the deep implications of the computational requirement to represent and reason about 1-1 correspondences. So the label "conservation" is used to sum up what the children have or have not understood when they succeed or fail in Piaget's tests. And, as Annette Karmiloff-Smith once remarked, they try to vary the tests to find out whether children can pass some variant at an earlier age, but without providing any analysis of requirements for passing or for mechanisms that can meet those requirements.
"Decades of developmental research were wasted, in my view, because the focus was entirely on lowering the age at which children could perform a task successfully, without concern for how they processed the information." Karmiloff-Smith(1994)
Frege and Russell essentially tried to show that this is merely a case of (rather complicated) logical deduction that could be expressed in the symbolism of modern logic. However the ancient Greeks and many others had already discovered and used such properties of numbers long before the invention of modern logic.
Understanding the concept of cardinal number includes understanding why a one-one correspondence between two collections of discrete items is preserved no matter how the items are re-arranged, as long as no objects are removed, merged or separated into two or more parts. Piaget's work showed that this understanding does not come automatically with being able to count or being able to answer questions correctly in special cases. I suspect that neither Piaget nor anyone else knows how brains represent information about particular one-one mappings or acquire abstract non-empirical knowledge about general properties of transformations that involve one-one mappings. The concept of a one-one correspondence between two arbitrarily large collections of objects of any type (concrete, abstract, physical, mental, etc.) is not one that fits any mechanism I have ever heard a neuroscientist describe. Without that, our concept of cardinal number cannot be understood. I suspect animal brains, and especially human brains, use important mechanisms that have not yet been identified by neuroscientists.
Frege 1950 attempted to show that such mathematical knowledge is purely logical, but it is clear that mathematicians understood these properties of cardinality before the logical apparatus used by Frege and others had been discovered.
In order to understand how an AI system can understand the natural numbers as
they were understood before the rise of logic, we shall have to explain how it
can reason visually (e.g. using a diagram, or imagining a possible change in
some collection of objects) and thereby discover that one-one correspondence is
transitive (among many other properties). The figure illustrates what is
discovered but does not explain how. I don't think anyone knows how human brains
make such discoveries, how the discovered information is stored, how the brain
mechanisms allow future inferences to be made, and how all this knowledge is
acquired in a form that is independent of how many objects are involved, how big
or small they are, what shape they have and whether they are physical objects or
locations, or places, or abstractions such as number names, or how the
information that there are infinitely many possible cardinalities is represented
in brains. These cannot be statistical generalisations from perceived examples,
since when understood the generalisations are known to have no exceptions.
Moreover they can be understood as applying not only to collections previously
encountered but to arbitrarily large collections of objects or events or names,
etc. How can biological brains support such competences and discoveries? So far
I don't think these abilities have been replicated in AI systems, though I
suspect that may simply be because we have not yet discovered the right forms of
representation and the required information processing architectures.
[To be added: refer to work of Doug Lenat, Simon Colton, Alison Pease and others who have attempted to model mathematical cognition.]
It was not reported at the time, but is clear from the videos on the project web site (a) that she makes the hooks in several different ways, with (approximately) functionally equivalent results (i.e. using a crack in the plastic tray to grip the end of the wire, using the tape at the base of the tube to grip the wire, using her foot on a horizontal rail to grip the wire, and using a hole in the wall next to a small perch to grip the wire, (b) that there appears to be no random trial and error in her hook-making or hook-using behaviour, (c) that each of the episodes of hook-making and use involves several different steps in which some possibility is identified, and then new possibilities and impossibilities are achieved on the basis of the previously realised possibilities. There is no evidence that she had had experience of bending pieces of wire, or similar materials before these experiments, although when she first bent a straight piece of wire she had previously used a bent piece of wire provided by the experimenters.
Betty seemed to be conscious that
A detail that is easily overlooked is that when she inserts the hook into the tube she also places one foot on the rim of the tube. That provides two widely separated support points, presumably allowing more precise control of the wire when moving the end under the handle of the bucket. Later, she uses the foot on the rim to achieve sufficient height to pull the hook and the bucket out of the tube. Examining the last few seconds of the video suggests to me that without the foot on the tube she would not have been able to achieve the height required. How much of that Betty understood is not clear, but in the video of trial 7 she did not first try without grasping the rim.
- it is possible for the end of a hook to be passed under the handle of the bucket holding food,
- it is possible to raise the hook in that configuration
- it is impossible for the bucket to remain where it is when the wire is raised with the hook looped through the handle
- continued raising of the hook will lift the bucket past the top of the glass tube
- after rising beyond the top of the tube the bucket can be moved sideways then downwards to the table
- food can be extracted when the bucket is on the table
Another sort of example is understanding some of the things that can be done
with holes, an example being Betty's ability to pass the end of a hook through
the "hole" provided by the bucket and its handle. A video of a
pre-verbal toddler apparently using a pencil to explore different routes through
a hole in a sheet of paper is here:
She seems to be testing a hypothesis about 3-D topology (the existence of continuous deformations between two configurations) though she could not, at that age, have formulated such a hypothesis in words.
As far as I know there is no current robot that can become aware of such a possibility and thereby acquire the motive to make the possibility actual, which seems to happen a great deal with human children (illustrating conjectures about "Architecture-based motivation" Sloman (2009b)).
We can formulate the "Side stretch theorem" (SST) in two parts:
(SST-out) IF a vertex of a triangle is moved along an extended side away from the interior of the side (as in Figure S) THEN the area of the triangle increases.This illustrates a deceptively simple example of a mathematical relationship between a type of spatial structure, types of process that can occur, and types of change necessarily associated with those processes. The simplicity is deceptive because although you are likely to find the claimed mathematical relationship obviously true, it is not at all clear what sorts of visual information processing mechanism provides the ability to think about the possibility of change when it is not actually occurring and find the relationships obvious. For example, the formulation in terms of an area increasing or decreasing presupposes the existence of a shape independent measure of area, and that any such measure exists is not obvious. Clarifying that concept for an arbitrary area was a major mathematical achievement, closely related to the discovery of integral calculus. However, there is a simpler mathematical discovery that does not require the notion of a measure, only the notion of inclusion or containment, a part-whole relationship.
(SST-in) IF a vertex of a triangle is moved along a side towards the interior of that side, THEN the area of the triangle decreases. (Draw your own figure for this case.)
For now, I'll leave open the question whether the Side Stretch Theorem (SST-in/out), or the Side Containment Theorem (SCT) can be derived from something more basic and obvious, requiring biologically simpler, evolutionarily older, forms of information processing.
Note, however, that the notion of the vertex being "moved along" a line "away from" or "towards" another point on the line implicitly makes use of a metrical or semi-metrical (ordering of lengths without a numerical measure) notion of length, which increases or decreases as the vertex moves. The concept of motion between two locations on a line also implicitly assumes the existence of intermediate locations between those locations.
The thought experiments discussed there could all have been performed by ancient thinkers long before the development of modern logic and formal methods of proof. Such pre-logical mathematical thinking was the only kind available to Euclid and his predecessors, since modern logic was developed only in the last few hundred years.
Conjecture: the ability to make those mathematical discoveries, and others in Euclid's Elements were dependent on abilities to notice and reason about possibilities, of the sorts discussed above, along with other social abilities necessary for publishing or teaching the materials.
Work to be done still includes identifying the precise visual functions needed and precise specifications for the mechanisms providing those functions. We may then be able to build artificial systems (e.g. robots) whose visual and mathematical capabilities are far more like those of humans than any existing robots.
Until we know how to do that, our robots and other forms of AI will all be severely limited and capable only of very simple forms of learning. They will not be capable of the forms of learning and discovery that drove mathematical discoveries in humans.
You can remove the string by pulling one end, or by pulling the other end. Why can't you remove it even faster by pulling both ends? What needs to be added to current robots to enable them to (a) discover such impossibilities, (b) understand why they are not possible?
If you pull both ends at the same time, there is a configuration that can be achieved faster: what configuration? The ability to answer that might be based on searching through a mass of data concerning previous pulling episodes. But that isn't required. What sort of ability would enable a robot to answer the question without resorting to experiments with strings and holes, and without searching through stored records of previous such experiments? How do you answer the question?
It was argued that such reasoning could be as useful, and as reliable, as reasoning based on logical and algebraic forms of representation. In both cases the reasoner has to make assumptions about how the form of representation works, i.e. what various structures and transformations represent, and on the basis of such general assumptions draw conclusions by reasoning about particular cases. These forms of reasoning, familiar in uses of maps, in uses of diagrams in physics and electronics, and in architectural drawings are all examples of valid reasoning that is not based on statistical evidence or inferred probabilities.
In simple cases such reasoning can make use of imagined transformations of an imagined diagram: there is no need for a physical diagram to be used. (A similar comment applies to reasoning using a logical notation, or natural language: the reasoning can be done "in one's head".)
I suspect that pre-verbal children and some highly intelligent non-human animals (e.g. squirrels and crows) are to some extent capable of such valid reasoning using non-Fregean forms of representation internally, but unlike adult humans they are incapable of reflecting on what they are doing, communicating it to others and defending or criticising the validity of the reasoning. That requires meta-cognitive extensions to the information-processing architecture.
There has been considerable psychological research on human "mental rotation" abilities, initiated by Shephard and Metzler, summarised in https://en.wikipedia.org/wiki/Mental_rotation, comparing difficulty of tasks, times required to detect rotation, etc.
The following figure (from Wikipedia) includes typical 2D and 3D examples of mental rotation challenges.
There is a large freely available collection of pairs of images of 3D structures made of rectangular blocks here: https://openpsychologydata.metajnl.com/articles/10.5334/jopd.ai/
Ganis, G. & Kievit, R.A., (2015). A New Set of Three-Dimensional Shapes for Investigating Mental Rotation Processes: Validation Data and Stimulus Set. Journal of Open Psychology Data.
3(1), p.e3. DOI: http://doi.org/10.5334/jopd.ai
For a machine vision system to perform this task the fact that the structures compared decompose into straight segments, or segments composed of cubes or rectangular blocks meeting at right angles makes it possible for a relatively simple algorithm to check whether superposition is possible in a finite number of steps (left as an exercise for the reader).
But as far as I know nobody has investigated brain mechanisms that could enable brains to detect impossibility of superposition by rotation and translation. I suspect individuals use a large collection of learnt heuristics that vary according to culture, age and individual.
Things get far more computationally explosive if more general shapes are used, as illustrated here using 2D shapes as examples. (Compare Minsky and Papert on Perceptrons. [REF])
For different pairs of images different heuristic methods can lead fairly quickly to answers, e.g. counting the number of 'ends' in each image (some have four some have two), testing whether a shape can be coloured in more than one colour without the two colours merging (a test for connectedness), imagining a feature of one image being superimposed on the other and following round both images looking for a location of mis-match, and finding what happens if attempts are made to imagine pairs of ends superimposed then checking how far the superposition extends. (That reveals that two of the images that are reflections of each other about a line in the plane cannot be superimposed.)
Which pairs of shapes below are congruent if translation and rotation in the plane are allowed, but not flipping over or reflection across a line? Which can be made congruent if reflected across a line? Which images are proper parts of others, possibly after reflection and/or rotation?
By now some readers will have noticed that far from being purely mathematical challenges unrelated to anything else, these challenges are closely related to a common type of picture puzzle for children, where the task is to look at a picture and find one pictures of more more familiar objects forming undifferentiated parts of the picture. The concept of impossibility is relevant to this task in order to rule out answers based on finding a structure that could be part of something (e.g. a bicycle) but whose continuation could not be.
In that sense impossibility detection is part of common recognition tasks. The same can be said of the task of identifying the grammatical structure of sentences (including "garden path sentences") that have a part with a possible interpretation made impossible by another part of the sentence, as in this old example
Detecting impossibility of changing containment relationships:
E.g. consider two closed non-self-crossing curves in a plane. If one curve is inside the other then a continuous deformations can get them to coincide without it ever being the case that one curve crosses the other (i.e. part of curve 1 is inside curve 2 and another part of curve 1 is outside curve 2 during the process of merging).
However, if neither curve is initially inside the other, no such continuous deformation in the plane can get them to coincide.
If the two closed curves are
on the surface of a sphere there is no such impossibility. If the two closed
curves are on the surface of a torus it may or may not be impossible, depending
on their initial locations. See
The example generalises to two closed continuous surfaces in 3D space: if one is contained in the other then smooth deformations can get them to coincide without there ever being a place where one surface crosses the other (i.e. they never intersect in a line in 3D space). However if neither surface contains the other, no continuous deformations can get them co coincide without the surfaces ever crossing. (What would correspond to the case of two curves on the surface of a sphere?)
What brain mechanisms can detect such impossibilities? Can this be done by any robot vision system so far built?
Also reasoning about curves on a torus:
Reasoning about putting on a shirt
Reasoning about rings and chains
Reasoning about what can't be done with chained rubber bands
Added 20 Nov 2016 (only loosely related):
The Shepard (rotated table) illusion and some others
This section will later be expanded.
The discovery of such useful re-usable patterns essentially involves
mathematical abstraction: though in these cases the abstraction is not done
consciously or deliberately: Evolution is a "Blind mathematician",
implicitly discovering and using mathematical facts long before there were human
It can be argued that this provides one of several types of "foundation" for
mathematics, as discussed in
Visual process perception is a particularly difficult problem, since, as we'll
see perceived scenes can have very complex structures and can change in ways
that are not well represented by the kinds of mathematics currently used for
dynamical systems e.g. using collections of parameters and their derivatives (of
various orders). Such a mode of representation would not, for example, be well
suited for describing changes during construction of a meccano model such as the
model crane illustrated in this document:
In any perceived portion of the physical world there will be collections of objects with various properties and various relationships and various changes going on, so that perceptual information includes not only static scene information, but information about processes. It is arguable that for biological vision process perception is the main achievement, and is far more demanding than perception of static scenes, or scenes that are almost static except for the motion of the viewer. For every possible static configuration of objects there are many different possible processes that either start or end with that configuration or transiently include that configuration. So the combinatorics of process perception vastly outstrip the combinatorics of static scene recognition and description.
One of my reasons for selecting a video taken by a camera moved around in a garden is that plant scenes (below) can show a great deal of structure on different levels of scale with a wide range of variation within each type of structure, and with structures that change radically from one location to another and from one view of a particular location to another. So videos, or live views, from changing viewpoints in natural or cultivated environments can provide processes in which the complexity and speed of changes, and the variety of types of change, provide extreme challenges to natural or artificial vision systems.
Those challenges seem to be exacerbated for viewers with two eyes moving at once and constantly having to recompute the information required for perception of 3-D distances and structures. I suspect that very fast and accurate, but low-resolution stereo vision uses large scale corresponding object features (e.g. vertical or approximately vertical edges or portions of edges, of doorways, tables, cups, tree-trunks, etc.) in results of monocularly processed images.
Those results can guide the search for more fine-grained very low level
correspondences for precise, close up, stereo vision using small scale image
features. That guidance is not available in random dot stereograms, so stereo
merging is then much more complex and much slower, though it can give higher
resolution when successful. This aspect of stereo vision may have evolved later.
Although useful some of the time this mechanism for very low level stereo vision is not essential for normal vision, and there are significant numbers of people who cannot see random dot stereograms, although their 3-D vision appears to be normal most of the time.
Moreover, whereas an agent moving voluntarily may be able constantly to provide information about the (intended, or actual) changing viewpoint for use by the visual system in integrating information across time and between eyes, viewing a previously produced video does not make that extra motor-control-based information available. Yet for many decades humans have been happily sitting in fixed seats viewing cinema shows and television shows in which there's a great deal of motion, including changes of viewpoint. The fact that intentional or kinaesthetic or vestibular (semi-circular canal) information about viewer motion is not available does not seem to be a great handicap. It's possible that that is true only for animals that spend a great deal of their early life being carried by a parent, so that evolution was under pressure to develop motion perception systems that work without the support of motor control information available to a voluntarily moving agent.
Some of the reader's perceptual processing capabilities in a garden are demonstrated here: http://www.cs.bham.ac.uk/research/projects/cogaff/misc/vision/plants/garden-vids.html
For a low-level tutorial introduction
One is the complexity and speed of change that can be perceived by the human visual system (even if we don't see as much as we think we do!). This seems to require concurrently changing structural relations of different sorts across both scales and ontologies (also illustrated for static scene perception in Chapter 9 of Sloman 1978). This is the phenomenon Clowes labelled perception involving different domains.
The second challenge is the diversity of ontologies required for perception of different scenes, and different aspects of the same scene, including not only the many different parts of a typical horticultural center, but also changes across types of things seen including maps, handwritten text, printed text, music, computer programs, mathematical notations, and the different written/printed forms some bi-lingual or multi-lingual readers can cope with. This sort of ontological diversity is illustrated in the garden videos referenced above.
It is not obvious to me that current forms of artificial computation or known forms of neural computation can handle such varieties of complexity at the sorts of speeds human visual systems (and presumably also visual systems of other fast moving animals) can deal with.
Question/Conjecture: A biological role for quantum computation?
Could this processing challenge prove to be the first significant and non-artificial challenge for quantum computation -- a much more urgent problem than getting computers to understand Goedel's theorem.
Any such biological solution will almost certainly require the use of sub-neuronal information processing mechanisms (e.g. possibly microtubules suggested by Hameroff and Penrose as solutions for far less compelling and demanding challenges?).
An important point in this context is that the key features of quantum mechanisms required do not include the randomness that is so often emphasized. Instead, what is important in many biological phenomena is the opposite: quantum mechanisms allow structures like chemical bonds to form that are highly resistant to perturbation: a feature that is essential for preservation of genetic material from one generation to another, as pointed out by [Schrödinger 1944].
Evolution has clearly used the mechanism of chemistry (and therefore quantum physics) to create enduring structures that persist despite many surrounding changes. If that feature is also used for information-processing, but with much faster locking and unlocking of states, combined with superpositions of sets of alternative possibilities (another important feature of quantum mechanics) then perhaps the use of quantum mechanisms for solving constraint propagation problems that pervade visual processes may turn out to be essential for recently evolved brains.
For some organisms this feature may be repeated at meta-cognitive levels, allowing introspective mechanisms to access complex ongoing processes to produce "summary" descriptions of what's going on, and also to re-direct those processes -- a mixture of bottom-up and top-down processing that has increasingly pervaded computer systems engineering in the last 60-70 years.
Such forms of meta-cognition may not be required for organisms to make use of mathematical structures in their perceptual processing and problem solving, but they are essential if information-processing systems are to have useful knowledge of what they are doing, so that they can control it according to specific goals, learn from it (e.g. debug features that lead to failures) and produce new kinds of functionality: as computer scientists and engineers have been doing for decades -- but for systems much less complex and sophisticated than brains.
This mechanisms will require specific extensions to the collection of
construction-kits produced and used by biological evolution, discussed in
I don't know enough about quantum mechanics or neuroscience to take this possibility further. If anyone who has the required expertise would like to discuss this please get in touch with me.
The key implication for us is that a system whose capabilities include detection and representation of possibilities for change, even when those changes are not occurring, potentially has massive requirements beyond what is needed for scene perception. One way to deal with this is to find ways of representing or categorising, and reasoning about, classes of possibilities that are relevant to an organism, rather than handling all possibilities separately. That can be achieved by using appropriate levels of abstraction, ignoring details that vary within a class of cases. I suspect evolution discovered and used far more examples of that strategy than have been identified: in that sense evolution is more a blind mathematician than a blind watchmaker.
People who have had no experience of designing information-processing systems are surprised by the phenomenon of change blindness, and ask "Why don't we see the changes?", instead of asking "Why do we see changes?" and "What mechanisms make it possible to see changes?" First you need to explain how it is possible to see anything at all. Seeing changes adds significant extra complexity, that may not be obvious to people who have never tried to design a working vision system.
A video recorder recording a complex scene with constant changes does not see any changes, because it does not see. It does not know the difference between recording a constant scene and a changing scene. It has no idea what information is contained in the recording. All it does is acquire information that we (or our programs) can access about a long sequence of very short episodes where information about very short-lived states of a collection of photo-receptors is copied into some form of enduring record.
For specific changes to be detected more is required than for changing information to be recorded. What more?
Is it trivial to give a computer based vision system an ability to detect change in a scene? What if it is not fixating a particular location but continually centering on different parts of the scene, while only one small part of the scene changes? What would be required for that change to be detected? The machine would need a kind of enduring memory for perceived states of the parts of the environment being monitored and continual comparisons between newly acquired information about those states and the recorded state information. However, correspondences between locations in the environment and locations in the records would keep changing between frames (snapshots).
Many biological control mechanisms (homeostatic mechanisms in particular) perform functions based on change detection: some information about a useful or desired state (e.g. body temperature, or fluid pressure) may be stored somewhere as a target state. Then potentially constantly changing actual sensed states are compared with that target state: if there is a discrepancy then some compensatory mechanism to reduce the difference may be activated, or its level of activation modified.
More sophisticated versions compare current and recent recorded states to see whether the gap between target state and actual state is increasing or decreasing -- using mechanisms that regularly replicate and retain current sensed state, long enough to compare it with the target state.
If the gap/discrepancy exists and is increasing, the strength of the gap-reducing influence may be increased. But if the gap is already reducing, then it may be useful to reduce the gap-reducing influence -- to avoided over-shooting the target state. These considerations will be very familiar to any control engineer accustomed to designing control systems were all relevant states can be represented by measures (including measures of change and measures of rate of change, etc.).
Visual information has a far more complex structure than a particular measure such as pressure or temperature at a particular location, or a collection of measures. So the mechanisms for making use of the visual information need to be far more complex -- and the types of information obtained may be far richer.
There may be over a million sensors in an eye, with varying receptive fields, and different policies for connecting the sensor states to other parts of the nervous system.
Moreover saccades, head movements, body movements and movements of objects in the environment can all produce substantial and in some cases very fast changes in the patterns of neural activation. With all those high speed constantly changing patterns of activation, with constantly changing relationships to the light sources and light-reflecting surfaces, and in some cases constantly changing goals (e.g. while running across uneven, rocky terrain, where feet have to be carefully placed on new surfaces at high speed) no simple and obvious ideas about how the information is processed are likely either to explain the phenomena or to provide a basis for designing robots with visual capabilities of humans or other highly mobile animals.
For all these reasons, the detailed ways in which visual information about the environment is actually used must be far more complex than the cases considered by Gibson, where relatively simple mathematical transformations can extract information from changing retinal stimulation (e.g. the "rate of expansion" in an optical flow pattern, where flow radiates outward from a target location).
In a more complex visual scene different things can be changing in different directions at the same time. Mechanisms for detecting and using information about all those changes will need to be very complicated, except in cases where multiple changes are understood as different aspects of one fairly simple change: e.g. where textured items on a floor move in a complex but regular way because of horizontal motion of the viewer (as noted by Gibson and others), or the less obvious case where many cogs on a wheel are perceived to be moving in different directions, because the wheel is rotating about its axis. Gibson drew attention to the ways in which such patterns of stimulation can provide useful information about speed and direction of relative motion to an animal with eyes.
But those scenes based on large regular structures and processes are relatively rare outside human-designed environments, and are mostly products of relatively recent human construction capabilities. In contrast, the changes in patterns of retinal stimulation of an animal moving in its natural environment may be far more complex, and carry information that is far more complex than any of Gibson's examples, and require far more complex mechanisms than he seems to have considered.
The garden videos mentioned below recorded using a camera moved around near flower beds, shrubs, bushes, and trees. provide examples showing how when a camera or eye moves around in a densely foliated space, instead of a mathematically regular structure producing a time-varying retinal image that changes in a mathematically regular way, there are very many different spatial structures moving to some extent independently (e.g. when stirred by a breeze). Yet you can probably keep up with a high level overview of what's happening to the changing viewpoint and a lot of information about what is in the field of view, and where things are moving in and out of the field of view, in real time.
I don't know at what age a child is able to do this sort of thing, and I don't know how many other species can do it, but it seems likely that many birds, tree-climbers, hunting mammals and others will have at least partly similar visual capabilities.
The videos show large numbers of partly coordinated, partly uncoordinated, changes. How a seeing machine can be made to see all the changes evident to a normal (adult?) human in those videos is a hard, unsolved research problem. However, the hardest problem may not be specifying the design, but specifying the requirements to be satisfied by a good design. By what criteria could be evaluate a machine vision system's performance in interpreting such videos as evidence for a good model (or simulation) of human vision in that sort of context? It is not at all clear what the requirements are. Part of the reason is that so much more happens in human (and presumably animal) visual processing than could be characterised by results of behavioural tests.
These capabilities cannot all be products of a single, general-purpose, learning mechanism acting on the organism's sensory and motor signals during the early years of life, if they are based in part on genetic information built up many years ago by ancestors in different lineages. In particular, a totally general learning mechanism would be much too slow for precocial species where the offspring need to move around pecking for food unaided (ducklings, chicks) or need to be able to run with the herd soon after birth (e.g. wildebeest foals only a few hours old, without having had time to learn).
Don't assume a teacher with prior knowledge of the theorems has to be involved: someone must have made some of these discoveries without being told them by a teacher.
NOTE 1 This kind of discovery of primeness by a computer program was discussed in Pease et al. (2010). But their program seems to have been given a systematic way of carving up spatial possibilities by the designers. It did not understand enough to work that out.
NOTE 2 One of the fundamental requirements for mathematical thinking is being able to organise collections of possibilities and making sure that you have checked them all. If you can't do that you don't have a mathematical result, only a guess. How can you know that you have checked all possibilities? The history of mathematics shows that even brilliant mathematicians can make mistakes Lakatos (1976). This means that the traditional emphasis on the role of "certainty" in mathematics may be misguided: certainty, or its absence, like infallibility or its absence, is a matter of the psychology of mathematicians, not the subject matter they investigate, which is something richer and deeper: a feature of the universe that was playing a role in evolution (the "Blind Mathematician") long before human mathematicians existed. Computers, like drawings in sand, slates and chalk, pen and paper, 3-D models made of wires and beads, and other aids to thinking and communication, have expanded what human mathematicians can do, but not changed the nature of the subject matter. Some are tempted to conclude that mathematics is essentially a social phenomenon. That's true only for relatively weak-minded mathematicians (e.g. human mathematicians). See also Wolfram (2007).
Although some researchers may regard that list is comprehensive, it leaves out some important functions of vision that are rarely noticed by vision researchers (although there are researchers who do investigate some of them). For example vision is often used for obtaining information about what is possible or impossible in the environment and obtaining information about how the environment relates to abilities, risks, needs, or intentions of other agents. Information about what is or is not possible is relevant both to the immediate practical uses of vision, but also to mathematical discoveries, as we'll see. This generalises Gibson's claim that the function of vision is to provide information about affordances for the perceiver in the environment Gibson (1979).
In particular, my complaint about omission of important functions of vision (and more generally perception) applies to statistics-based models and theories of vision that have been very successfully applied to special purpose robots and other machines with limited(!) functionality.
A core symptom of the inadequacy of those models and theories is that their proponents (in my experience) neither acknowledge, nor attempt to explain, the roles of vision in mathematical discovery, such as discovery of theorems and proofs in Euclidean geometry (including topology). Those discoveries are concerned with what is possible (e.g. it is possible, using compasses, to produce circles of any radius around a specified point, in a specified plane), what is impossible (e.g. it is impossible for three finite planar surfaces to completely enclose a 3-D space, though three lines can completely enclose a 2-D area in a plane), and what is necessarily the case, e.g. if A is longer than B and B is longer than C then A must be longer than C.
(NOTE: the concepts of possibility, impossibility and necessity have nothing to do with probability. In particular, 0% probability is not the same thing as impossibility and 100% probability is not the same thing as necessity).
I'll offer some conjectures about the evolutionary history of visual functions and mechanisms in humans and other species, including abilities to see what is and is not possible in a situation: a very different type of function from merely recording sensory states or environmental states, responding to those states or predicting future states. This is related to a contrast between online and offline intelligence. Offline intelligence is central to science and mathematics, and also to some forms of perception of affordances.
Another theme, not developed much in this document, is that understanding varieties of human consciousness and the underlying mechanisms, requires understanding kinds of mathematical consciousness that lead to mathematical discoveries: e.g. as a result of noticing that something seems to be possible, impossible or necessarily the case, and then asking "How do I know?" This relates some kinds of mathematical discovery to forms of introspection, which in turn are related to forms of information-processing architecture produced by biological evolution that support the required varieties of introspection: architectures that include meta-cognitive mechanisms.
I suspect some other species, and pre-verbal children, can make use of mathematical discoveries, but lack the information processing (meta-cognitive) architectures required to notice what they are doing and understand why it works, or why it never works!
NOTE: Examples are presented in the discussion of "Toddler theorems" here:
The questions and tentative answers presented here are inspired in part by Immanuel Kant's philosophy of mathematics in Kant (1781), in part by an extension of James Gibson's ideas about the functions of vision (Gibson (1966) and (1979)), and in part by work by Max Clowes in the 1960s and 1970s concerning vision's connection with multiple "domains", and the possibilities of ambiguity, paraphrase, and anomaly (impossibility) in visual contents (see Clowes Tribute). I make particular use of the first picture of an "impossible triangle", or, more precisely, an impossible configuration of cubes produced by Oscar Reutersvard, in 1934, introduced piecemeal above. It is richer, in interesting ways, than many better known pictures of impossible objects.
The functions of vision that relate to mathematical discoveries also require meta-cognitive mechanisms -- using more than one layer of meta-cognition -- closely integrated with vision. In particular, mathematical discovery often requires a form of introspection, first generating discoveries similar to "This seems to me to be impossible", then "Why am I sure this is impossible?", followed later by attempts to influence the introspections of other mathematicians (or learners). But that claim is still highly programmatic: there are still important gaps in the work, including gaps regarding architectures for cognition and gaps regarding the forms of representation required. For example, logical and algebraic forms do not seem to be adequate. (I am not claiming that these mathematical discovery processes cannot be implemented in computers or robots, merely pointing to unsolved problems, continuing work begun in Sloman (1971). Contrast the impossibility claims in Penrose (1994).
This paper is also related to work on virtual machinery and consciousness, including the causal powers and varieties of "privacy" (of qualia) in virtual machines discussed in Sloman&Chrisley (2003), and Sloman (2013).
Above all, this paper continues a decades long attempt (begun in my DPhil thesis(1962)) to understand the nature of mathematics and the biological mechanisms that made it possible for Euclid and his predecessors to achieve so much, long before the development of modern logic, algebra, formal methods, and proof theory.
The mechanisms of mathematical discovery by humans have deep roots in biological mechanisms for information processing, but the mathematical facts discovered are not products of biology or human culture. The same facts should be discoverable by any sufficiently intelligent animal or machine. Some of them were "implicitly discovered" by natural selection mechanisms as explained here.
Some of the mechanisms required may be incapable of being implemented in digital computers as we know them. Some wild, half-baked, speculations are offered about possibly relevant quantum mechanisms (for constraint propagation). There may, or may not, be a requirement for specific forms of quantum computation in some particularly demanding forms of perception, e.g. of contents of an open-air horticultural centre, while the wind is blowing and the perceiver is moving (illustrated by videos above).
This has nothing to do with the arguments of Penrose and Hameroff, as far as I can tell. That topic is still left open here.
Experiences of mathematical discovery are among the richest, deepest, and most useful experiences of which humans are capable. Many of them have deep connections with biological functions of animal vision. That's because evolution produced visual mechanisms whose main functions include discovery and use of affordances of many kinds, and affordances, like the contents of mathematical discoveries are not merely concerned what is the case, but also with what is and is not possible, or necessarily the case. (This has nothing to do with probabilities or statistical information.)
The overlap between common animal functions of visual perception and the peculiarly human ability to make mathematical discoveries using vision illustrates the slogan of Max Clowes: "Perception is controlled hallucination", vaguely echoing von Helmholtz: "Perception is unconscious inference", and Kant's suggestion that all our empirical knowledge is made up of both "what we receive through impressions" and of what "our own faculty of knowledge supplies from itself". Kant, however, did not, as far as I know, link the powers of our faculties to their evolutionary history or to related capabilities in other intelligent species, as I am attempting to do. Neither did he have the opportunity I have had to use computers to work on problems in Artificial Intelligence. I suspect he would have appreciated the deep philosophical importance of AI instantly.
NOTE: AI vision mechanisms based on so-called deep learning are incapable of producing or explaining the mathematical competences discussed here, insofar as they merely deduce probabilistic relationships from statistical evidence. That cannot support conclusions about impossibilities and necessary connections -- the stuff of mathematics.
Different "layers" of such machinery, representing structures, relationships, and processes with additional types of manipulation and inspection machinery were added in later stages of evolution, supporting meta-cognitive representations of spatial structures and processes.
Some preliminary requirements for such machinery, based on analysis of some
cognitive processes involving imagined distortion of triangular shapes, can be
found in these documents, though the ideas will be extended later:
Ludwig Wittgenstein wrote, in his "Tractatus Logico-Philosophicus"
3.032 It is as impossible to represent in language anything that 'contradicts logic' as it is in geometry to represent by its coordinates a figure that contradicts the laws of space, or to give the coordinates of a point that does not exist.
3.0321 Though a state of affairs that would contravene the laws of physics can be represented by us spatially, one that would contravene the laws of geometry cannot.
It is not clear what 3.032 is claiming about language, since far from being impossible it is easy to express in words things that are logically inconsistent, e.g. if I were to claim that that is both possible and impossible. Wittgenstein must have known this.
3.0321 is more interesting for us: there are interesting and important counter-examples in the form of 2-D pictures of geometrically impossible 3-D objects, which we'll come to later. Some of them had not been discovered at the time Wittgenstein was writing.
In a philosophy graduate seminar around 1960 I offered a line drawn on a blackboard as a counter-example to 3.0321, like this:
A more complicated, more interesting, example from Reutersvard was presented above..
This pattern of variation in expression of a common genome was described in terms of a distinction between "pre-configured" and "meta-configured" competences in Chappell and Sloman 2007, which presented an epigenetic theory summarised in Figure Evo-Devo.
(This can be seen as a generalisation of Waddington's 'Epigenetic Landscape'In at least some animals ("precocial" species) the specification for detailed spatial competences is clearly in the genome. But not in all animals. It seems that more sophisticated, more abstract, competences, capable of being instantiated in more varied ways, are provided in a more generic form in genomes for more sophisticated species (often described as "altricial" species), such as humans, squirrels, and crows, whose offspring do not show (or need) such competences until long after birth. The competences develop in a context sensitive manner on the basis of complex interactions between the genome and the environment, as depicted in Figure Evo-Devo, above.
-- for individuals that re-design the landscape during their own development.)
The diagram summarises what could be called the "Meta-configured genome" theory, though that label was not used in the original paper. This diagram and the related published papers do not yet incorporate the additional varieties of information-flow from conspecifics to the individual or vice versa involving explicit teaching and learning, co-discovery, and cultural changes.
In humans, this sort of developmental diversity is most evident in language development, which produces very different linguistic competences in different cultures -- as shown both by how children in different cultures express themselves (what they can say) and also differences in what they understand. The differences between sign-languages and spoken languages are particularly striking. (Elsewhere I have argued that sign-languages must have evolved first, and that structured internal languages evolved even earlier, and in more species Sloman[Vis-Lang].)
The kind of diversity spawned in this way is also evident in other intelligent competences demonstrated in games, music making, art forms, playing with construction kits, and environmentally related skills, e.g. dealing with clothing or food, swimming, climbing trees, climbing rocks, building structures out of sand, etc.
Those complex combined products of evolution and environment that take time to develop and can differ between members of the same species, both across cultures and within cultures are well known. I am suggesting that abilities to perceive structures and structural changes may be equally complex, slow to develop and liable to differ between species and also between members of the same species.
I hope that all this will help to make readers understand that asking why we do not see certain changes is the wrong question if we don't yet know how changes are seen, and what conditions need to be satisfied, both in the environment, and in cognitive/perceptual mechanisms, for changes of various sorts to be detected. If mechanisms of change detection, instead of being simple and universal are subject to the kind of developmental diversity indicated in Figure Evo-Devo then before asking why environmental changes are not seen in some situations, we need to develop good theories as to how they are seen. Detecting Some changes may require far more complex perceptual systems than those that are used in simple forms of "online" intelligence.
All this is a refinement of John McCarthy's remark (2008)
"Animal behavior, including human intelligence, evolved to survive and succeed in this complex, partially observable and very slightly controllable world. The main features of this world have existed for several billion years and should not have to be learned anew by each person or animal."
(Originally on his web site in 1996.)
The point is that much that appears to be "learnt anew" may in fact be new instantiations of previously evolved highly generic competences, some of which capture mathematical features of the environments in which ancestors evolved.
The results of such evolution can produce "meta-configured" generic genomes that are instantiated in different ways in different developmental contexts by using (sometimes very complex) parameters acquired from the environment in ways that can vary both across species and between members of the same species (e.g. young humans developing in cultures at very different levels of technical competence).
However, this kind of "parameter substitution" will be far more complex than the simple forms of parameter substitutions found in mathematics or common programming languages. Compare "parametric polymorphism" in programming languages, where the effect of a parameter may depend not only on its type but also the types of additional parameters supplied in the context. Forks have a generic type of use, but garden forks, pitchforks and table forks have different uses requiring different skills and knowledge, even though they share a common abstract schema for structure and function.
Chappell and Sloman (2007) offer a schematic model, crudely depicted in Figure Evo-Devo, of how genetic specifications at different levels of abstraction, activated at different stages of development, can combine with information acquired directly or indirectly (via earlier instantiations) from the environment at different stages of development. (This generalises Waddington's concept of an "epigenetic landscape". Our schematic theory is closer to the concept of "Representational Redescription" presented in Karmiloff-Smith (1992).)
Theories of learning based on completely general statistical mechanisms interacting with sensory data will not accurately describe learning based partly on abstract genetic information about the environment acquired previously by natural selection, in altricial species, such as humans and crows, where the genetic mechanisms make sophisticated use of parametric polymorphism, for example.
In http://www.cs.bham.ac.uk/research/projects/cogaff/misc/ijcai-2017-cog.html I try to show how in principle all this may be relevant to ancient mathematical discoveries, long before the development of modern logic, algebra, and formal axiomatic systems.
Being able to detect actual changes in the environment is one thing (though it may have surprising complexity, as discussed above) but being able to consider, and reason about, possible changes that are not occurring, and are not actions of the perceiver, is very different. Compare the discussion of perception of possible, and impossible, changes above. That is a kind of affordance perception that as far as I know Gibson did not notice, a very serious omission in theories of animal intelligence and functions of human vision. Most other researchers also seem not to notice. It is also not usually discussed in publications on the nature of mathematical discovery.
Vol 1. The role of possibility in cognitive development (1981)Also much of his earlier work, e.g. Piaget (1952) above.
Vol 2. The role of necessity in cognitive development (1983)
Tr. by Helga Feider from French in 1987
During preparation for the talk, the notes grew and grew, and continued growing
after the talk, including added links to other related documents on the
Meta-Morphogenesis web site:
Some of these ideas and examples were also presented in a tutorial at the
SGAI 2015 AI conference in Cambridge, 15-17 December 2015.
Notes for the tutorial are here:
A lot more changes, including additions and reorganisation occurred in January
2016 in preparation for a talk in Jerusalem.
School of Computer Science
The University of Birmingham