The Combination of Face Identification and Action Recognition for Fall Detection
##plugins.themes.academic_pro.article.main##
Author
-
Ngu D. DaoThe University of Danang - University of Science and TechnologyThien V. LeThe University of Danang - University of Science and TechnologyHanh T. M. TranThe University of Danang - University of Science and TechnologyYen T. H. NguyenThe University of Danang - University of Science and TechnologyTuan D. DuyThe University of Danang - University of Science and Technology
Từ khóa:
Tóm tắt
Falls are a very common unexpected accident that result in serious injuries such as broken bones, head injury. Detecting falls, taking fall patients to the emergency room, and sending notification to their family in time is very important. In this paper, we propose a method that combines face recognition and action recognition for fall detection. Specifically, we identify seven basic actions that take place in elderly daily life based on skeleton data extracted using YOLOv7-Pose model. Two deep models which are Spatial Temporal Graph Convolutional Network (ST-GCN), and Long Short-Term Memory (LSTM) are employed for action recognition on the skeleton data. The experimental results on our dataset show that ST-GCN model achieved an accuracy of 90% that is 7% higher than the LSTM model.
Tài liệu tham khảo
-
[1] NCOA, [Online] Available: https://ncoa.org/article/get-thefacts-on-falls-prevention.
[2] H. T. M. Châu, 2018, [Online] Available: http://benhvientinh.quangtri.gov.vn/vi/scientificresearch/Nguy-co-te-nga-o-benh-nhan-cao-tuoi-dang-dieutri-tai-benh-vien-da-khoa-tinh-Quang-Tri.html.
[3] Kuppusamy, P., and C. Harika. "Human action recognition using CNN and LSTM-RNN with attention model." Int. J. Innov. Technol. Explor. Eng 8 (2019): 1639-1643.
[4] Wu, Jiang, et al. "Fall detection with cnn-casual lstm network." Information 12.10 (2021): 403.
[5] Jeong, Sungil, Sungjoo Kang, and Ingeol Chun. "Humanskeleton based fall-detection method using LSTM for manufacturing industries." 2019 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC). IEEE, 2019.
[6] Yan, Sijie, Yuanjun Xiong, and Dahua Lin. "Spatial temporal graph convolutional networks for skeleton-based action recognition." Thirty-second AAAI conference on artificial intelligence. 2018.
[7] Cao, Zhe, et al. "Realtime multi-person 2d pose estimation using part affinity fields." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[8] Kay, Will, et al. "The kinetics human action video dataset." arXiv preprint arXiv:1705.06950 (2017).
[9] Github, [Online] Available: https://github.com/WongKinYiu/yolov7/tree/pose.
[10] Qi, Delong, Weijun Tan, Qi Yao, and Jingfeng Liu. "YOLO5Face: why reinventing a face detector." arXiv preprint arXiv:2105.12931 (2021).
[11] Deng, Jiankang, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. "Arcface: Additive angular margin loss for deep face recognition." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4690-4699. 2019.
[12] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016.
[13] Maji, Debapriya, Soyeb Nagori, Manu Mathew, and Deepak Poddar. "YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2637-2646. 2022.
[14] Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors." arXiv preprint arXiv:2207.02696 (2022).
[15] Sak, Ha¸sim, Andrew Senior, and Françoise Beaufays. "Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition." arXiv preprint arXiv:1402.1128 (2014).
[16] Wu, Zonghan, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S. Yu Philip. "A comprehensive survey on graph neural networks." IEEE transactions on neural networks and learning systems 32, no. 1 (2020): 4-24.
[17] Github, [Online] Available: https://github.com/littlepure2333/2s_st-gcn.
[18] Yi, Dong, Zhen Lei, Shengcai Liao, and Stan Z. Li. "Learning face representation from scratch." arXiv preprint arXiv:1411.7923 (2014).
[19] Zheng, Tianyue, Weihong Deng, and Jiani Hu. "Cross-age lfw: A database for studying cross-age face recognition in unconstrained environments." arXiv preprint arXiv:1708.08197 (2017).
[20] Moschoglou, Stylianos, Athanasios Papaioannou, Christos Sagonas, Jiankang Deng, Irene Kotsia, and Stefanos Zafeiriou. "Agedb: the first manually collected, in-the-wild age database." In proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 51-59. 2017.
[21] Ng, Hong-Wei, and Stefan Winkler. "A data-driven approach to cleaning large face datasets." In 2014 IEEE international conference on image processing (ICIP), pp. 343-347. IEEE, 2014.
[22] Chen, Sheng, Yang Liu, Xiang Gao, and Zhen Han. "Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices." In Chinese Conference on Biometric Recognition, pp. 428-438. Springer, Cham, 2018.
[23] ImViA, Fall detection dataset, 2020, [Online] Available: https://imvia.u-bourgogne.fr/en/database/fall-detectiondataset-2.html.
[24] Stack Exchange, Micro Average vs Macro average Performance in a Multiclass classification setting, 2016, [Online] Available: https://datascience.stackexchange.com/questions/15989/micro-average-vs-macro-average-performance-in-a-multiclassclassification-setting
[25] Fang, Hao-Shu, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. "Rmpe: Regional multi-person pose estimation." In Proceedings of the IEEE international conference on computer vision, pp. 2334-2343. 2017.