Nghiên cứu ứng dụng học sâu xây dựng bộ nhận dạng vật thể giúp thanh toán hàng hóa nhanh
##plugins.themes.academic_pro.article.main##
Author
-
Nguyễn Trí Bằng, Nguyễn Đình Vinh, Trần Trọng Đức
Từ khóa:
Tóm tắt
Hiện nay, chưa có nhiều nghiên cứu về ứng dụng học sâu vào mảng nhận dạng thanh toán hàng hóa; Hầu hết chỉ nêu ra việc sử dụng YOLO để theo dõi số lượng vật phẩm thay đổi trên kệ hàng. Bài báo này trình bày giải pháp xây dựng bộ nhận dạng vật thể thời gian thực giúp thanh toán hàng hóa nhanh. Tác giả sử dụng YOLOv4, TResNet và FAISS lần lượt ở các giai đoạn phát hiện vật thể, trích xuất đặc trưng, phân loại hình ảnh đầu ra. Điều này giúp việc thêm dữ liệu mặt hàng mới mà không phải huấn luyện lại từ đầu so với giải pháp chỉ dùng YOLO. Bộ nhận dạng có một camera được lắp bên trên bàn thanh toán và màn hình hiển thị thông tin hóa đơn. Với kết quả thử nghiệm ban đầu, bộ nhận dạng có độ chính xác trung bình 94,54%. Thời gian thanh toán nhanh gấp đôi so với quét mã vạch. Ngoài ra, tác giả giới thiệu tập dữ liệu thanh toán hàng hóa BRC, góp phần cải thiện sự thiếu hụt dữ liệu trong cộng đồng nghiên cứu học sâu.
Tài liệu tham khảo
-
[1] B. Santra and D. P. Mukherjee, “A comprehensive survey on computer vision-based approaches for automatic identification of products in retail store”, Image and Vision Computing, 2019, vol. 86, 45–63.
[2] Jupiter Research, “AI spending by retailers to reach $12 billion by 2023, driven by the promise of improved margins”, Jupiter Press Release, 2019.
[3] F. D. Orel and A. Kara, “Supermarket self-checkout service quality, customer satisfaction, and loyalty: empirical evidence from an emerging market”, Journal of Retailing and Consumer Services, 2014, vol. 21, 118–129.
[4] A. C. R. Van Riel, J. Semeijn, D. Ribbink, and Y. BomertPeters, “Waiting for service at the checkout: negative emotional responses, store image and overall satisfaction”, Journal of Service Management, 2012, vol. 23, số 2, 144-169.
[5] F. Morimura and K. Nishioka, “Waiting in exit-stage operations: expectation for self-checkout systems and overall satisfaction”, Journal of Marketing Channels, 2016, vol. 23, no. 4, 241–254.
[6] Athanasios Voulodimos, Nikolaos Doulamis, Anastasios Doulamis, Eftychios Protopapadakis, "Deep Learning for Computer Vision: A Brief Review", Computational Intelligence and Neuroscience, vol. 2018, ID 7068349, 13 trang, 2018, https://doi.org/10.1155/2018/7068349.
[7] Yuchen Wei, Son Tran, Shuxiang Xu, Byeong Kang, Matthew Springer, "Deep Learning for Retail Product Recognition: Challenges and Techniques", Computational Intelligence and Neuroscience, vol. 2020, Article ID 8875910, 23 pages, 2020. https://doi.org/10.1155/2020/8875910.
[8] L. Karlinsky, J. Shtok, Y. Tzur, and A. Tzadok, “Fine-grained recognition of thousands of object categories with singleexample training”, Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017, 4113–4122.
[9] P. Follmann, T. Bottger, P. Hartinger, R. Konig, and M. Ulrich, MVTec “D2S: densely segmented supermarket dataset”, Proceedings of the 2018 European Conference on Computer Vision (ECCV), 2018.
[10] X. S. Wei, Q. Cui, L. Yang, P. Wang, and L. Liu, “RPC: a large-scale retail product checkout dataset”, 2019.
[11] Z. Zhao, P. Zheng, S. Xu and X. Wu, "Object Detection With Deep Learning: A Review”, in IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212-3232, Nov. 2019, doi: 10.1109/TNNLS.2018.2876865.
[12] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection”, Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, 779-788.
[13] Joseph Redmon, Ali Farhadi, “YOLO9000: better, faster, stronger”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7263-7271.
[14] R. Girshick, “Fast R-CNN”, Proceedings of the IEE.E international conference on computer vision, 2015, 1440-1448.
[15] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards realtime object detection with region proposal networks”, Advances in neural information processing systems, 2015, 91-99.
[16] Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection”, arXiv preprint arXiv:2004.10934, 2020.
[17] Joseph Redmon, Ali Farhadi, “YOLOv3: An Incremental Improvement”, arXiv preprint arXiv:1804.02767, 2018.
[18] C. G. Melek, E. B. Sonmez and S. Albayrak, "Object Detection in Shelf Images with YOLO”, IEEE EUROCON 2019 -18th International Conference on Smart Technologies, 2019, pp. 1-5, doi: 10.1109/EUROCON.2019.8861817.
[19] Bing-Fei Wu, Wan-Ju Tseng, Yung-Shin Chen, Shih-Jhe Yao, Po-Ju Chang, “An Intelligent Self-Checkout System for Smart Retail”, International Conference on System Science and Engineering (ICSSE), 2016.
[20] Sandeep Kumar Yedla, V. M. Manikandan, Panchami V, “Real-time Scene Change Detection with Object Detection for Automated Stock Verification”, 5th International Conference on Devices, Circuits and Systems, 2020.
[21] G. Kumar and P. K. Bhatia, “A Detailed Review of Feature Extraction in Image Processing Systems”, Fourth International Conference on Advanced Computing & Communication Technologies, 2014, 5-12.
[22] Dong ping Tian, “A Review on Image Feature Extraction and Representation Techniques”, International Journal of Multimedia and Ubiquitous Engineering, Vol. 8, No. 4, 2013, 385-395
[23] X. Jiang, “Feature extraction for image recognition and computer vision”, 2nd IEEE International Conference on Computer Science and Information Technology, 2009, 1-15.
[24] X. Lu, X. Kang, S. Nishide and F. Ren, “Object detection based on SSD-ResNet”, IEEE 6th International Conference on Cloud Computing and Intelligence Systems (CCIS), 2019, 89-92.
[25] M. F. Haque, H. Lim and D. Kang, "Object Detection Based on VGG with ResNet Network”, 2019 International Conference on Electronics, Information, and Communication (ICEIC), 2019, pp. 1-3, doi: 10.23919/ELINFOCOM.2019.8706476.
[26] Tal Ridnik, Hussam Lawen, Asaf Noy, Emanuel Ben Baruch, Gilad Sharir, Itamar Friedman, “High Performance GPU-Dedicated Architecture” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 1400-1409.
[27] Samuel Rota Bulo, Lorenzo Porzi, and Peter Kontschieder, “In-place activated batchnorm for memory-optimized training of dnns”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[28] Jie Hu, Li Shen, and Gang Sun, “Squeeze-and-excitation networks”, Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, 7132–7141.
[29] Xie, Lingxi & Hong, Richang & Zhang, Bo & Tian, Qi, “Image Classification and Retrieval are ONE, ICMR’15, 2015, 3-10.
[30] M. Wang, Y. Ming, Q. Liu and J. Yin, “Similarity search for image retrieval via local-constrained linear coding”, 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 2017, 1-6.
[31] Rahman M.M., Bhattacharya P., Desai B.C, “Similarity Searching in Image Retrieval with Statistical Distance Measures and Supervised Learning”, Pattern Recognition and Data Mining ICAPR 2005: Pattern Recognition and Data Mining, 2005, vol 3686, pp 315-324, https://doi.org/10.1007/11551188_34
[32] Jeff Johnson, Matthijs Douze, Hervé Jégou, “Billion-scale similarity search with GPUs”, arXiv preprint arXiv:1702.08734, 2017.
[33] Github, “Hierarchical Navigable Small World”, Release v0.5.0, https://github.com/nmslib/hnswlib, 2021.
[34] Johnson, Jeff and Douze, Matthijs and J, Tzutalin, “LabelImg”, arXiv preprint arXiv:1702.08734, 2017.