In the scenarios we have been considering there are only ever two animate entities: the PlayMate itself and its tutor (the human it interacts with). We will restrict ourselves to this setup for the end of CoSy system too (although this shouldn't lead to over-simplifications about the source of action in the environment).
In terms of the PlayMate perceiving itself, the most important facet of this is perceiving its own arm, including its relationship to the rest of the PlayMate (especially its cameras) and the world. This will be critical for doing manipulation in a flexible way that uses online control (i.e. not using ballistic movements and inverse kinematics). Perception of the arm should be supported by vision, proprioception, and possibly any haptics we can add to the arm.
In terms of perceiving the tutor, the PlayMate must be able to understand the language it produces. For the CoSy system we will limit the linguistic interactions between the PlayMate and it's tutor to an informed subset of possible interactions. This limit should be set by someone else though! The PlayMate must also be able to understand any relevant gestures made by the tutor. By the end of CoSy we should be able to support deictic gestures such as pointing.