To go beyond a system whose visual perception of the world is limited to a single viewpoint, we must give the PlayMate the competence to autonomously change its viewpoint. Once we do this we are faced with problems such as object permanence and correspondence between frames (although this is already a problem with video... we have just chosen to ignore it so far!).
Given such an ability the PlayMate would then have to reason about things it couldn't see. This is discussed in more detail here.