Birmingham Web Site for the EC-Funded CoSy Project
Cognitive Systems for Cognitive Assistants


The Birmingham part of the CoSy project includes collaboration with the other partners on most of the work-packages, but the main work in Birmingham will be focused on attempting to integrate within a single robot a variety of capabilities normally studied separately in different branches of AI (and also studied separately by different researchers in psychology and neuroscience). The robot to be built at Birmingham will mainly be concerned with the 'PlayMate' scenario, involving manipulation of 3-D objects on a table top. In parallel with this other CoSy partners will be work on the 'Explorer' scenario, involving a mobile robot.

Work at Birmingham will focus on:

The summary of our 12 month objective is

PlayMate will be presented with various objects lying on a table and asked to perform various tasks. While doing this it may also answer questions, ask questions and react to changes in the scene.
Initially the linguistic interactions will use screen and keyboard, but when the speech processing experts in the CoSy project can interface their software with the rest of the robot software, spoken language will be used.

For more information on the whole project see the project summaries available here.

Hardware to be used
The robot consists of two parts

- a B21 mobile robot with stereo cameras standing next to the table top and able to move a little as needed


- a Katana robot arm with six degrees of freedom, including a gripper with two fingers, mounted on the table and able to reach and manipulate objects on the table including things like blocks, small soft toys, wooden or metal cups and saucers, etc., as shown in these images.

Both the mobile robot and the arm will be connected to powerful computers running a version of the linux operating system (which is used for all our software development), including a Sun W2001z dual CPU 2.6Ghz AMD Opteron (64 bit).

There are also a microphone and speakers, for verbal communication between the robot and humans.

The PlayMate robot (PM) will acquire and use information about a collection of objects on the table which it can manipulate, as can a human sitting nearby. In later stages both PM and the human will manipulate the objects, whereas in the earliest experiments, the only things moving will be PM's arm and perhaps its head (as it shifts position to change its viewpoint). PM will need to be able to perceive and identify the objects on the table, and also be aware that there is a person sitting nearby looking at and talking about the same collection of objects. PM will have to know something about its own viewpoint, for instance in deciding when to change the viewpoint in order to see something better. For some of the interactions later in the project, the robot will also need to understand the person's viewpoint, e.g. knowing what is and is not within reach of the person and what the person can and cannot see (e.g. because some objects are behind others).

The initial tasks will include giving the robot

NOTE 1 on the Nature/Nurture issue

There are many projects that assume that because it is so difficult to design human-like robots (or even impossible), the only sensible strategy is to design something that can learn, perhaps as a new-born human does, and then use training instead of programming, relying on its ability to learn. There are many reasons why we have deliberately not followed that route, including the difficulty of finding out what sorts of learning abilities new-born humans have, or what sorts of innate learning abilities will suffice for the tasks: much of developmental psychology at that stage is guesswork. We conjecture that most of what a newborn infant is actually doing is not observable in experimental situations, e.g. building, or extending, an information processing architecture.

One way to investigate that is to design something that has at least some of the capabilities of an older child and then, having found out what sorts of architectures, mechanisms, forms of representation, etc. actually work, then try to work backwards to investigate what sort of learning system could achieve that. This may not succeed, but we prefer to start from (simplified versions of) things we know some, or most, children can do and see how much of it we can put together in a working system, even if many aspects of the implementation are biologically implausible.

We can also investigate what can be learnt on top of those initial abilities.

Another factor influencing our thinking is that although there are animals that learn a huge amount in their lifetime, developing from an initial state of near helplessness (i.e. members of altricial species), it is unlikely that the only thing evolution has provided for them, apart from their physical mechanisms, is a general-purpose learning system. What's more likely is that the innate learning capabilities are closely tailored to many features of the environment, and which features those are and how the innate mechanisms relate to them will differ from one species to another, even if there are some commonalities. For instance an animal that manipulates objects only or mostly with its beak (like nest-building birds) and an animal that can manipulate things with two independently moving hands while its eyes remain relatively still, will need to learn different things about space, time and movement. So maybe crows and primates start with significantly different learning mechanisms. What the latter have to learn may be one of the things we find out from the exploration described below.

There is more on this topic after the draft scenario descriptions.

The initial practical tasks, and some of the later tasks in the project will include the following (making use of as much pre-existing code as possible from elsewhere):

  1. Developing visual procedures for analysing monocular and stereo images so as to segment the objects on the table, recognize them, and determine their locations in 3-D space. Where appropriate this will use active vision: i.e. the robot may move in order to vary the appearance of the scene, so as to get more information about occlusion relationships, 3-D position, etc.

  2. Calibrating the robot's understanding of the position and relationships of the arm (or arms) in 3-D space, using both vision and feedback from sensors in the arm.

  3. Developing procedures for getting the hand close to a specified object on the table, moving from any arbitrary position. The motion need not be either fast or smooth initially: e.g. a series of slow approximately linear moves may suffice for the initial tasks. (Learning to move smoothly could come later.)

  4. Enabling the robot to use its perceptual, control, and planning capabilities to obey a simple command such as 'Point at the big green block' (We assume that pointing at an object means bringing the hand or part of the hand close to the object or possibly touching it gently.)

  5. Being able to tell when the command cannot be obeyed either because what is referred to does not exist, or the referring expression is ambiguous, or the task is not within the robot's capability -- e.g. if 'pointing' means bringing the hand close, and the object is visible but far out of reach.
    For an existing 'toy' demo illustrating some of the points, based on work originally done in the early 1970s see http://www.cs.bham.ac.uk/research/poplog/figs/simagent/#gblocks

  6. Investigating requirements for obeying more complex instructions such as 'Point at all the green things'. This requires the ability to ensure that nothing is pointed at twice or left out. Some contexts will make this easy, e.g. when there are few green things arranged in a row. Others will make it much harder, e.g. when there are many green things arranged randomly among non-green things. In general this will require the robot not only to carry out actions, but to know what it has and has not done. Moreover, it will need to know the difference between contexts where it can let the environment carry the short term memory load and those where it will have to carry the load.

  7. Still more complex tasks would be to follow up the previous action by responding to:
    • 'Point at them in the reverse order'.
      This may be easy in some cases (e.g. objects arranged in a row) and harder in others, e.g. if the robot generated pointing actions without remembering the sequence of objects pointed at.
      (This raises issues about episodic memory).
    • 'How many are there'.
    • 'You missed the one behind the big block'

  8. There are many more tasks that involve pointing, understanding, saying things, etc., for instance
    1. 'Point at all the red things then at all the round things, and say "red" or "round" for each one'.
    2. 'Point at everything that is red or round and say "red" or "round" for each one'.
      (The robot should notice the ambiguity about what to do when an object is both red and round.)
    3. 'How many of the squares are not red'.

  9. More complex tasks may involve partially hidden objects which the person can see but not the robot, and vice versa, where the robot asks for or gives information. These will be addressed later in the project.

  10. Later stages of the project will include manipulation of 3-D objects for various purposes, e.g.
    • moving something out of the way,
    • arranging things according to instructions given,
    • completing an incomplete construction,
    • collaborating with the human in constructing a bridge which has to go through an unstable state in which three hands are needed, etc.,
    • explaining why a particular move was made,
    • saying what would have been done if the configuration had been different,
    • explaining why a certain proposed action will not succeed,
    • answering questions of the form 'Why didn't you do it by ....?'
      and so on.

  11. A major task is designing the architecture that will combine all the different kinds of functionality (a task related to but distinct from designing tools to help with design and implementation of an integrated system). A partial specification of the architecture for an early prototype (labelled 'Kitty') is available here. (HTML)
NOTE 2 on The Nature/Nurture trade-off
There are many unanswered questions about trade-offs between innate capabilities and learning. Part of our task will be to find out the pros and cons of providing innate knowledge vs allowing the robot to learn.

It is clear that in animals evolution provides the result of millions of years of exploration of possible initial designs. But what is innate and what is learnt, especially in humans, remains a highly controversial topic. It may be that we need to find new ways of posing the problem, e.g. by allowing innate capabilities to be powerful but not totally domain-neutral bootstrapping capabilities, in addition to general learning capabilities, and highly specific innate capabilities (such as sucking in humans, finding and pecking at food in chicks).

We hope to liaise with developmental psychologists in order to find out how much is known about what children of different ages can and cannot do in relation to tasks such as these. There may also be interesting evidence from brain damaged patients about how such capabilities are typically decomposed in humans.

The CoSy 'Explorer' Scenario

The CoSy project has another scenario planned, which will be developed in parallel with the PlayMate scenario, namely the Explorer scenario, involving a mobile robot that will perceive, learn about, move around in, and communicate about locations in a building, routes between them, and the where various movable things happen to be. This work will not be done at Birmingham, though we expect to liaise closely with the sites concerned.

The aim of the CoSy project is to develop the two scenarios using as much commonality in the designs and tools as possible, so as to be able to merge them at a later date in a robot which is mobile and also able understand small scale 3-D spatial structures and affordances well enough to perform and communicate about 3-D manipulations as above.

On the methodology of scenario driven research

Some comments on the use of multiple scenarios, and on the choice of scenarios, can be found in these notes, prepared for the UK Grand Challenge project on 'Architecture of Brain and Mind': 'Metrics and Targets for a Grand Challenge Project Aiming to produce a child-like robot'

Additional papers and presentations, reporting ideas developed during the first year of the project can be found in the Birmingham CoSy papers directory and on the main CoSy web site in the 'Results' section.

The papers directory includes technical reports (some published), discussion papers (e.g. some web sites) and presentations, e.g. this Members' poster presentation on 'Putting the Pieces of AI Together Again' at AAAI'06.

We have also been developing a web-based tool to help with the difficult task of specifying requirements for work to be done on different time scales, including the very long term, the end of the project and the immediate future. A draft version of the tool can be found here (a matrix of requirements).

A humbling web site for roboticists

Specification for an 'ironing robot'
By Maria Petrou (Imperial College)
The background story (Editorial of the IAPR Newsletter, Volume 19, Number 4, 1997, with cartoons).

Last updated: 4 Sep 2006
Aaron Sloman and Jeremy Wyatt