Robots With Vision That Find Objects
ABSTRACT
Consider the following problem: A robot is instructed in language to find an object (from a generic class) in a cluttered room. For example, the robot may be asked to find an “apple” or a “cup”. We take a bio‐ inspired, active approach to this problem that combines vision, action, and higher‐level cognition .
In the past Active Vision, i.e. the approach of considering an active agent, whose selective actions facilitate perception, has been studied in the context of navigation tasks. Here we consider an active vision approach in the context of recognition. At the high level our approach consists of three modules: an attention mechanism that finds interesting parts of the scene, a segmentation mechanism that separates foreground regions from background, and a recognition mechanism based on shape descriptors of contours and surfaces, and the relationship of object affordances to their shape.
This year we developed a new mid‐level vision mechanism, called the Torque Operator, which captures the concept of closed contours. The torque operator takes as input edges and computes over regions of different size a measure of how well the edges are aligned to form closed, convex contours.
First, the torque was implemented and tested as a generic mechanism for visual attention and segmentation. Second, high level knowledge about object properties was incorporated into the torque. Specifically, known size and global shape was utilized in the attention process. Characteristic features of object contours, acquired through learning, were used to modify the torque mechanism to bias attention and segmentation for specific object classes.
AWARD ID: 1035542