Visible to the public Reinforcement Learning inspired Deep Learned Compositional Model for Decision Making in Tracking

TitleReinforcement Learning inspired Deep Learned Compositional Model for Decision Making in Tracking
Publication TypeConference Paper
Year of Publication2018
AuthorsChakraborty, Anit, Dutta, Sayandip, Bhattacharyya, Siddhartha, Platos, Jan, Snasel, Vaclav
Conference Name2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)
Date PublishedNov. 2018
ISBN Number978-1-5386-7638-7
Keywordscluttered background, Compositional Models, compositionality, Computational modeling, Computer science, decision making, deep learned compositional model, Deep Neural Network, deep reinforcement, heuristic methods, human bodies, human inputs, image representation, incessant decision making, learning (artificial intelligence), Markov processes, Mathematical model, multiple large video datasets, neural nets, object tracking, occlusion handling, partial occlusions, partially observable Markov decision making, penalty based training, pose estimation, pose estimation capabilities, previous location, pubcrawl, reinforcement learning, skeleton based part representation, Streaming media, target tracking, tracker, Tracking, video signal processing

We formulate a tracker which performs incessant decision making in order to track objects where the objects may undergo different challenges such as partial occlusions, moving camera, cluttered background etc. In the process, the agent must make a decision on whether to keep track of the object when it is occluded or has moved out of the frame temporarily based on its prediction from the previous location or to reinitialize the tracker based on the belief that the target has been lost. Instead of the heuristic methods we depend on reward and penalty based training that helps the agent reach an optimal solution via this partially observable Markov decision making (POMDP). Furthermore, we employ deeply learned compositional model to estimate human pose in order to better handle occlusion without needing human inputs. By learning compositionality of human bodies via deep neural network the agent can make better decision on presence of human in a frame or lack thereof under occlusion. We adapt skeleton based part representation and do away with the large spatial state requirement. This especially helps in cases where orientation of the target in focus is unorthodox. Finally we demonstrate that the deep reinforcement learning based training coupled with pose estimation capabilities allows us to train and tag multiple large video datasets much quicker than previous works.

Citation Keychakraborty_reinforcement_2018