Visible to the public Human Action Recognition in Video Using DB-LSTM and ResNet

TitleHuman Action Recognition in Video Using DB-LSTM and ResNet
Publication TypeConference Paper
Year of Publication2020
AuthorsMihanpour, A., Rashti, M. J., Alavi, S. E.
Conference Name2020 6th International Conference on Web Research (ICWR)
Date PublishedApril 2020
ISBN Number978-1-7281-1051-6
Keywordsaction recognition, CNN architecture, convolutional neural nets, convolutional neural network, convolutional neural networks, DB-LSTM, DB-LSTM network, deep bidirectional LSTM networks, deep neural networks, deep video, feature extraction, human action recognition method, image motion analysis, image processing, image sequences, learning (artificial intelligence), man-machine interaction, Metrics, object detection, pubcrawl, PyTorch, recurrent neural nets, resilience, Resiliency, ResNet152, Scalability, video frames, video processing, video signal processing, video-content-based monitoring

Human action recognition in video is one of the most widely applied topics in the field of image and video processing, with many applications in surveillance (security, sports, etc.), activity detection, video-content-based monitoring, man-machine interaction, and health/disability care. Action recognition is a complex process that faces several challenges such as occlusion, camera movement, viewpoint move, background clutter, and brightness variation. In this study, we propose a novel human action recognition method using convolutional neural networks (CNN) and deep bidirectional LSTM (DB-LSTM) networks, using only raw video frames. First, deep features are extracted from video frames using a pre-trained CNN architecture called ResNet152. The sequential information of the frames is then learned using the DB-LSTM network, where multiple layers are stacked together in both forward and backward passes of DB-LSTM, to increase depth. The evaluation results of the proposed method using PyTorch, compared to the state-of-the-art methods, show a considerable increase in the efficiency of action recognition on the UCF 101 dataset, reaching 95% recognition accuracy. The choice of the CNN architecture, proper tuning of input parameters, and techniques such as data augmentation contribute to the accuracy boost in this study.

Citation Keymihanpour_human_2020