on the basis of discussions with
Jackie Chappell (Biosciences Birmingham),
Jeremy Wyatt, Marek Kopicki, Nick Hawes, Somboon Hongeng,
the Birmingham CoSy Project,
Tony Cohn (Leeds), David Hogg
(Leeds), and other
members of the CoSy Project
The germ of the idea was born in a conversation between Marek Kopicki and Aaron Sloman.
Domains that occur naturally include configurations of water, sand, mud, stones, trees, grass, other plants, and many kinds of animals. Some of the objects naturally move autonomously while others move only when manipulated by something else or blown by the wind, etc.
Non-natural domains include things like string, string plus other materials, paper, cloth, cotton-wool, plasticene, toy blocks, dolls and their clothing, toy vehicles, cutlery, crockery, and many kinds of construction kits, e.g. Lego, Meccano, Tinker toys, Fischer tecknik, etc. (See Michael Adler's Meccano web site: http://www.meccanotec.com/ and the International society of Meccanomen: http://www.internationalmeccanomen.org.uk/ -- why only men???)
Combining domains provides new richer domains with novel capabilities that can be explored and used. E.g. combining string and paper produces possibilities like parcels that can be pushed, thrown, etc. without falling apart. Combining cutlery, crockery and various kinds of solid and liquid edible substances produces many new possible ways of consuming food and liquid.
There are also abstract domains, such as arithmetic, many kinds of games, words, phrases, sentences, poems, limericks, and many branches of mathematics. In the last few decades new domains have been developed that young children learn to interact with using computing devices even though their ancestors never encountered anything similar. Some primates can also learn to interact with them.
Our domain of polyflaps was invented partly to produce shapes and processes that might be easier for a robot to see than some of the more common shapes (e.g. avoiding curved surfaces and curved edges), while being cheap to make, and lightweight, and indefinitely extendable.
The pictures shown below are examples of 'polyflaps'. Each polyflap is made from a single sheet of paper cut in a polygonal shape (convex or concave, regular or irregular) and then folded once, but not folded flat, so as to form a 3-D structure. (Future examples may be made of cardboard, or some other material).
Any collection of polyflaps can clearly be extended indefinitely, forming more and more complex domains of perception and action, e.g.
The set of possible polyflaps is very large (a) because there is a very large set of possible polygons, taking into account different numbers of sides, different angles between sides, and different lengths of sides, and (b) because for each such polygon a large number of more or less different polyflap can be produced from one fold of between 0 and 180 degrees, to produce a 3-D object.
Different polyflaps will have different multi-stable possibilities depending on the shape of the original polygon and how it is folded.
So there is a large set of possible configurations of a single polyflap. Each configuration can be viewed from multiple viewpoints, adding to the complexity.
ADDED 22 Nov 2017: Note on triangle deformation
I now realise that videos of polyflaps, either taken with a moving camera while the polyflaps are stationary, or taken with a stationary camera while polyflaps are moving, or taken with moving camera while polyflaps are moving, will include rich examples of processes with mathematical structures that enable causal relations within the processes to be understood, even on a first viewing. I suspect this will turn out to be closely related to mechanisms required to explain the mathematical discovery process illustrated in this document on triangle deformation:
Also relevant is this discussion of the role of perception of partial orderings
and topological changes, in reducing or eliminating the need for numerical
estimation in perception of affordances and how they change:
In contrast, species described as 'precocial' by biologists are born or hatched relatively advanced physiologically, and behaviourally and cognitively competent (e.g. deer that run with the herd soon after birth, and chicks that peck for food soon after hatching). Their cognitive competence must be largely genetically determined, since there is no time to learn about the environment between birth and demonstrating the competence.
In fact, all species involve a mixture of both kinds of features as explained in . While continuing to use the labels 'precocial' and 'altricial' to refer to types of species, we now propose to use different labels for the two main kinds of capabilities found in those species.
Note that the non-preconfigured capabilities depend on the operation of "higher level" pre-configured (i.e. genetically determined) bootstrapping capabilities. Adult members of altricial species seem to have a higher proportion of non-preconfigured competences than adult members of precocial species (born or hatched physically developed and behaviourally competent).
We are interested in explaining kinds of competence in the second category, competences that are not genetically determined but arise from play, exploration, observation and imitation using powerful genetically determined 'meta-level' learning mechanisms. This document is about some possible domains in which such mechanisms might operate in robots and animals.
However, as the scenes perceived become more complex, cluttered or indistinct, the segmentation and recognition of familiar objects and the perception of spatial structure and affordances can use a mixed top-down and bottom-up perceptual processing mechanism, e.g. as frequently proposed by researchers of many sorts, e.g. here
When the lighting produces shadows or highlights or merely variations of intensity the complexities increase. Likewise structure in the background.
In addition to the variation in the possible scenes and the possible images that can be produced in those scenes, there are multiple possible actions that can be performed by a robot with a two-fingered gripper on each polyflap in each of its stable configurations. Once the polyflap is grasped and lifted there are large numbers of additional configurations many not included in the stable configurations when the polyflap is resting.
Moreover from each configuration there is a large space of possible types of movement (combinations of translation and rotation) that can lead to many other configurations.
Two views of some unfolded polyflaps lying on the carpet
(which the robot will not necessarily encounter -- e.g. because they are too hard to pick up with a typical gripper, though at a later stage the robot might learn how to use the corner of a polyflap to lift an unfolded polyflap):
Each polyflap is made by folding one of those polygonal shapes once and leaving the fold 'open' to produce a 3-D shape. Several polyflaps can be placed close together, in some cases with one resting on two or more others, to produce 3-D polyflap configurations. Some examples are in the next lot of pictures.
Can you tell which polyflap comes from which polygon?
The pictures shown here all include some hastily made polyflaps, and are merely illustrative. The polyflaps were made by cutting an old brown paper envelope into polygons, then folding each one once.
For the purposes of the research (on requirements for an altricial robot learning how to perceive, understand and use spatial affordances) I would expect to move to more carefully made polyflaps which are not as flexible as paper. Plastic polyflaps may be used by Jackie Chappell in experiments with parakeets.
No flat polygons are being used because robot hands likely to be available to us would not easily be able to lift a completely flat object. (Likewise a very young child?)
We would start with simpler configurations than those shown, e.g. providing one polyflap at a time at first.
An interesting sub-project would aim to produce a robot that when faced with a previously constructed configuration of polyflaps can dismantle it one piece at a time without knocking anything over in the process.
Another interesting sub-project would be to get a robot to develop a mathematical theory of polyflaps, e.g. specifying conditions under which they are stable, and relationships between initial polygonal shapes and the 3-D structures that can be obtained from them.
Note added 25 Sep 2017
An extension of this would include understanding proto-affordances, e.g. ways of reasoning about possible variations in polyflap configurations, and considering questions such as which are stable, which would be useful intermediary stages in constructing new more complex structures (given a particular collection of primitive polyflaps).
These would require development of topological and geometric forms of reasoning, not learning statistical regularities (which seems to have been the only use of polyflaps in working AI systems since this project was first proposed). E.g.Marek Kopicki, Sebastian Zurek, Rustam Stolkin, Thomas Moerwald, Jeremy Wyatt, 'Learning modular and transferable forward models of the motions of push manipulated objects'
Autonomous Robots,, June, 2017, 01, vol 41, issue 5, pp. 1061--1082, Springer, US.
So polyflap competence requires structural-generative competence, probably using forms of representation with compositional semantics. Thus we have a domain that seems to have some of the key features often attributed to linguistic competence but not involving linguistic competence.
The suggestion is that this is closely related to the kinds of domains in which altricial animals gain competence without using language, and that the abilities that evolved to support development of such competence were important precursors to evolution of linguistic capabilities.
Like Galileo's and Newton's thought experiments, the polyflap world abstracts away from a huge amount of complex and messy detail of the real world (including some of the detail) involved in the ability to produce fast fluent behaviours.
We conjecture that what remains includes a substantial challenge that can lead to a major advance in our understanding of a variety of issues regarding perception of affordances, learning, reasoning, forms of representation, and architectures that build themselves while acting in the world, as seems to occur in the so-called 'altricial' species (unlike 'precocial' species whose individuals start with a much richer collection of genetically determined capabilities whose subsequent development seems to be far more circumscribed).
It is also conjectured that despite the complexity of the polyflap domain there is an interesting subset of it that can be fruitfully explored in the near future in research on cognitive robotics -- both in CoSy and in other related projects by our collaborators.
Moreover, this is only one of a collection of domains that can be investigated in similar fashion within a larger interdisciplinary collaborative research programme including cognitive robotics, linguistics, developmental psychology, neuroscience, biology, and philosophy (among others). In parallel with the polyflap domain we are also investigating simpler domains, e.g. objects being pushed around on 2-D surfaces.
Our physical world allows a huge variety of more or less distinct domains to be learnt about by children, animals, and robots, and the polyflap domain is merely one of those selected because it seems to suit some of the constraints of what can be done with a rather limited one armed robot with a gripper. We also expect to investigate what happens when birds with manipulative capabilities meet polyflaps. Perhaps parallel research could introduce children to polyflaps.
The polyflap domain is just one among many other domains associated with classes of toys and types of materials, e.g. blocks, sand, stones, water, mud, string, paper, pencils, crayons, paint, plasticine, lego, meccano, tinker-toys, dolls of many kinds and their accessories, electronic-circuit building toys, drawing and painting kits, and of course other animals and/or other humans.
We hope that collaborators in other projects will do parallel investigations, so that we can share our tools and our discoveries in a process of joint exploration.
Of course, one of the important questions is how learning about different domains supports creative new combinations, e.g. combining string with other things, in actions like wrapping up, pulling, swinging, tying down, or assembling structures from things that don't intrinsically stick together, e.g. making things out of sticks.
One conjecture underlying this work is that the notion of learnt sensorimotor contingencies will not prove rich enough to capture what a child or robot has to learn in dealing with a domain like this. The issue is discussed further in http://www.cs.bham.ac.uk/research/projects/cosy/papers/#dp0603 (COSY-DP-0603: Sensorimotor vs objective contingencies).
So we have deliberately chosen tasks where the recognition of 'whole' objects is not an intermediate goal (though it may be a side-effect), and where most of the visual perception that is produced does not depend on recognition of objects, though it may depend on learning to recognise various 3-D structure fragments and their relationships (described below) and also fragments of motion of 3-D structure fragments.
So our aim is to find ways of perceiving and understanding what is seen that does not presuppose recognition of previously encountered objects nor any linguistic competence.
If we can explain the nature of that spatial understanding we may hope to shed light on a number of central features of human intelligence:
(a) generative learning capability which involves the *unending* ability to extend the ontology used for thinking about, perceiving and acting in the world
(b) the fact that thinkers of many kinds have attested to the power of human-spatial forms of representation and reasoning, and the use of spatio-temporal metaphors in thinking about and dealing with problems in many domains that are not spatial, including such things as family relationships, search strategies in abstract problem solving, and many projects of mathematics.
b: a set of primitive perceptual competences, for perceiving simple spatial structures, relations, and processes (e.g. two parts of an edge moving closer, two edge-fragments changing their relative orientation, a part of an object sliding along a part of an edge, a fragment of an object coming into contact with a part of a surface of another, or sliding along that surface
c: mechanism and forms of representation for encoding those information fragments
d: a set of exploration-generating mechanisms that can produce actions that in turn can be perceived
e: a set of meta-level mechanisms determining classes of phenomena that are 'interesting' and should be labelled, the information stored, and later produced in varying ways
f: a set of meta-level 'syntactic' mechanisms for combining previously learnt perceptual or action fragments, or ontological components into larger, re-usable, structures, possibly parametrised in a way that copes with continuous variation. (This can lead to exponential growth in speed of problem solving, but in limited application areas: as seems to happen in humans.)
g: an architecture in which all these resources can be combined so as to grow the competences and extend the architecture in a manner that, at least to some minimal degree of approximation reflects hitherto unexplained aspects of learning and development in altricial animals.
Exactly which 3-D configuration fragments are learnt will be determined by the robot, as a result of applying very general criteria of 'interestingness' (still to be investigated) to processes that occur within its visual system when the robot performs various actions (e.g. moving its head or cameras), or when it passively observes things change as a result of its actions or the actions of others.
E.g. some 'interesting' fragments would be invariants during motions of various cases. Other interesting fragments might be fragments that can be created or destroyed by motion.
processes and causation
mapping causal powers (enabling, preventing, constraining, etc.) back to the static fragments
after the development of some basic competence in seeing and acting on polyflaps start introduction language learning processes.
(It may be that the mechanisms will be there from the start but unable to do anything. Or it may be that in humans the physical development of language learning mechanisms is 'deliberately' postponed until after infants have developed sufficient non verbal competence to provide a rich semantics to drive the linguistic learning: it seems that an important feature of altricial species is *staggered* development of cognitive mechanisms controlled by genetic mechanisms, of which sexual cognitive development is one of the most obvious examples.)
Clearly there are various kinds of surprise (Matthias Scheutz and I have an incomplete partial conceptual analysis of surprise in a larger incomplete paper. We should make this available.)
There's noticing something that has not been seen before (lump of yogurt on the carpet in the video below) which can trigger various new actions, e.g. pick it up, put it in spoon, put it in mouth. But then having done that, a new goal can be triggered: put another lump of yogurt on carpet.
More on affect: perceived internal and external states and processes,
and various kinds of actions need to be evaluated. It's clear from
watching very young children that a lot of the motivation and evaluation
has nothing to do with satisfying biological physical needs (getting
food, reducing various kinds of physical discomfort, generating
'physical pleasures') or even social needs. There are deeper cognitive needs.
(See also this paper on "Architecture-based motivation" contrasted with "Reward-based motivation":
One of the interesting questions to be addressed in the long run on the basis of research like this could be: what are the differences in learning capabilities of human children born with or without arms, e.g. those affected by the thalidomide tragedy in the 1960s. E.g. See
Other investigations of a related type could investigate learning capabilities of blind children, using robots with only arms, and no vision.
If we can identify different sorts of 'initial' architectures etc. we may be able to come up with a new way of classifying types of animal species, going far beyond the altricial/precocial distinction.
This can also lead to new kinds of research into evolutionary trajectories and the kinds of environments that trigger, facilitate or inhibit them.
Various Polyflap-based domains could provide useful environments in which to test the ideas about pre-configured and meta-configured competences in Ref [2.b], especially the ideas about cascaded interactions between genome and environment summarised in this diagram:
Diagram based on ideas in Ref 2b.
Three multidisciplinary research proposals to investigate these ideas, by
experiments on animal cognition and work on robot cognition were turned down.
The possibilities for expanding the robot work in different kinds of directions and also the interdisciplinary collaborations are endless.
Beyond Modularity: A Developmental Perspective on Cognitive ScienceSee my draft discussion of that book in
Since this paper was written, individual polyflaps have been used in various robot projects (e.g. search for "pushing polyflap"), but so far only in much simpler configurations than proposed here as a challenge for research in robotics (including vision).
Note added 10 Jan 2016: Compare the Author's submission to the Royal
Society on machine learning in 2016.
Note added 22 Nov 2017
So far nobody seems to have noticed the deep opportunities for mathematical discovery -- learning about structures, relationships, effects of various changes, including effects of changes of viewpoint on how things appear - necessary connections between structures and processes, including impossibilities:
(Compare the notion of an "aspect graph".)
For more on the context see
The altricial precocial spectrum for robots
Paper co-authored with Jackie Chappell, for IJCAI'05.
The next paper was a sequel, expanding these ideas.
Jackie Chappell and Aaron Sloman, (2007)
Natural and artificial meta-configured altricial information-processing systems, invited paper for International Journal of Unconventional Computing, 3, 3, pp. 211--239,
Challenge to AI vision researchers and others, regarding perception of shape-based affordances.
 Baby and yogurt video
 On scenario driven research:
 Provisional work on ontologies for a pre-linguistic agent
This web site is still under construction.
The ideas are expanded in