Cyber-physical systems of the near future will collaborate with humans. Such cognitive systems will need to understand what the humans are doing. They will need to interpret human action in real-time and predict the humans' immediate intention in complex, noisy and cluttered environments. This proposal puts forward a new architecture for cognitive cyber-physical systems that can understand complex human activities, and focuses specifically on manipulation activities. The proposed architecture, motivated by biological perception and control, consists of three layers. At the bottom layer are vision processes that detect, recognize and track humans, their body parts, objects, tools, and object geometry. The middle layer contains symbolic models of the human activity, and it assembles through a grammatical description the recognized signal components of the previous layer into a representation of the ongoing activity. Finally, at the top layer is the cognitive control, which decides which parts of the scene will be processed next and which algorithms will be applied where. It modulates the vision processes by fetching additional knowledge when needed, and directs the attention by controlling the active vision system to direct its sensors to specific places. Thus, the bottom layer is the perception, the middle layer is the cognition, and the top layer is the control. All layers have access to a knowledge base, built in offline processes, which contains the semantics about the actions.
The feasibility of the approach will be demonstrated through the development of a smart manufacturing system, called MONA LISA, which assists humans in assembly tasks. This system will monitor humans as they perform assembly task. It will recognize the assembly action and determine whether it is correct and will communicate to the human possible errors and suggest ways to proceed. The system will have advanced visual sensing and perception; action understanding grounded in robotics and human studies; semantic and procedural-like memory and reasoning, and a control module linking high-level reasoning and low-level perception for real time, reactive and proactive engagement with the human assembler.
The proposed work will bring new tools and methodology to the areas of sensor networks and robotics and is applicable, besides smart manufacturing, to a large variety of sectors and applications. Being able to analyze human behavior using vision sensors will have impact on many sectors, ranging from healthcare and advanced driver assistance to human robot collaboration. The project will also catalyze K-12 outreach, new courseware (undergraduate and graduate), publication and open-source software.
Off
University of Maryland at College Park
-
National Science Foundation
Submitted by Cornelia Fermuller on March 31st, 2016