MSAL-MIR: Multi-Stage Adaptive Loss for Medical Image Retrieval

Nguyen Van Hoang Phuc; Le Quang Nhat; Duong Manh Quan; Hoang Phuong Le; Nguyen Van Hieu

doi:10.31130/ud-jst.2025.23(9C).539E

https://doi.org/10.31130/ud-jst.2025.23(9C).539E

Tóm tắt: 131

|

PDF: 90

Author

Nguyen Van Hoang Phuc
The University of Danang - University of Science and Technology, Vietnam

Le Quang Nhat
The University of Danang - University of Science and Technology, Vietnam

Duong Manh Quan
The University of Danang - University of Science and Technology, Vietnam

Hoang Phuong Le
The University of Danang - University of Science and Technology, Vietnam

Nguyen Van Hieu
The University of Danang - University of Science and Technology, Vietnam

Từ khóa:

Deep Learning

Computer Vision

Medical Image Retrieval

Healthcare Applications

Tóm tắt

Efficient and accurate retrieval of medical images underpins timely diagnosis and informed clinical decisions. This work introduces a novel multi-stage training paradigm designed for medical image retrieval. In the first stage, a ConvNeXt model pretrained on ImageNet is fine-tuned using Focal Loss to address class imbalance. Building on this foundation, the feature space is refined with Triplet Margin Loss, where chosen sample triplets are used to enhance discriminative learning. Our approach further streamlines the retrieval process by applying Global Max Pooling, L2 normalization, and Principal Component Analysis (PCA) for dimensionality reduction, followed by integration with Facebook AI Similarity Search (FAISS) for efficient similarity search. Experiments on the ISIC 2017 and COVID-19 chest X-ray datasets demonstrate that the proposed method achieves significant improvements in evaluation metrics, including mean Average Precision at 5(mAP@5), Precision at 1 (P@1), and Precision at 5 (P@5)

Tài liệu tham khảo

[1]

K. Fukushima, “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position”, Biological Cybernetics, vol. 36, pp. 193–202, 1980, doi: 10.1007/BF00344251.

[2]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition”, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998, doi: 10.1109/5.726791.

[3]

T. Brosch and R. Tam, for the Alzheimer’s Disease Neuroimaging Initiative, “Manifold learning of brain MRIs by deep learning”, in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013, Berlin, Germany: Springer, 2013, vol. 8150, pp. 629–636, doi: 10.1007/978-3-642-40763-5_78.

[4]

S. M. Plis et al., “Deep learning for neuroimaging: a validation study”, Frontiers in Neuroscience, vol. 8, p. 229, 2014, doi: 10.3389/fnins.2014.00229.

[5]

D. Yang, S. Zhang, Z. Yan, C. Tan, K. Li, and D. Metaxas, “Automated anatomical landmark detection on distal femur surface using convolutional neural network”, in Proc. IEEE 12th Int. Symp. Biomedical Imaging (ISBI), Brooklyn, NY, USA, 2015, pp. 17–21, doi: 10.1109/ISBI.2015.7163806.

[6]

Y. Anavi, I. Kogan, E. Gelbart, O. Geva, and H. Greenspan, “Visualizing and enhancing a deep learning framework using patients’ age and gender for chest x-ray image retrieval”, in Proc. SPIE Medical Imaging: Computer-Aided Diagnosis, 2016, vol. 9785, p. 978510, doi: 10.1117/12.2217587.

[7]

X. Liu, H. R. Tizhoosh, and J. Kofman, “Generating binary tags for fast medical image retrieval based on convolutional nets and Radon Transform”, in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Vancouver, Canada, 2016, pp. 2872–2878, doi: 10.1109/IJCNN.2016.7727562.

[8]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition”, arXiv preprint arXiv:1512.03385, 2016.

[9]

N. Tajbakhsh et al., “Convolutional neural networks for medical image analysis: Full training or fine tuning?”, IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1299–1312, 2016.

[10]

G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks”, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), arXiv preprint arXiv:1608.06993, 2017.

[11]

H. C. Shin et al., “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning”, IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1285–1298, 2016.

[12]

Z. Liu et al., “A ConvNet for the 2020s”, arXiv preprint arXiv:2201.03545, 2022.

[13]

A. Esteva et al., “Dermatologist-level classification of skin cancer with deep neural networks”, Nature, vol. 542, no. 7639, pp. 115–118, 2017.

[14]

A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale”, arXiv preprint arXiv:2010.11929, 2020.

[15]

F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering”, arXiv preprint arXiv:1503.03832, 2015.

[16]

R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping”, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2006, pp. 1735–1742.

[17]

J. Wang, F. Zhou, S. Wen, X. Liu, and Y. Lin, “Deep ranking for image similarity learning”, arXiv preprint arXiv:1405.0301, 2014.

[18]

T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection”, arXiv preprint arXiv:1708.02002, 2017.

[19]

E. Ustinova and V. Lempitsky, “Learning deep embeddings with histogram loss”, arXiv preprint arXiv:1611.00822, 2016.

[20]

X. Wang et al., “Multi-similarity loss with general pair weighting for deep metric learning”, arXiv preprint arXiv:1904.06657, 2019.

[21]

Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition”, in Proc. Eur. Conf. Computer Vision (ECCV), 2016, pp. 499–515.

[22]

J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with GPUs”, arXiv preprint arXiv:1702.08734, 2017.

[23]

T.-Y. Lin et al., “Focal loss for dense object detection”, arXiv preprint arXiv:1708.02002, 2017.

[24]

J. Deng et al., “ArcFace: Additive angular margin loss for deep face recognition”, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2019.

[25]

B. Cao, A. Araujo, and J. Sim, “DELG: Deep local and global features for image retrieval”, arXiv preprint arXiv:2001.05027, 2020.

[26]

A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale”, arXiv preprint arXiv:2010.11929, 2020.

[27]

B. Hu, B. Vasu, and A. Hoogs, “X-MIR: Explainable medical image retrieval”, in Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 440–450.

Xem thêm

Đã Xuất bản

Sep 30, 2025

Download

PDF (English)

Cách trích dẫn

Phuc, N. V. H., L. Q. Nhat, D. M. Quan, H. P. Le, và N. V. Hieu. “MSAL-MIR: Multi-Stage Adaptive Loss for Medical Image Retrieval”. Tạp Chí Khoa học Và Công nghệ - Đại học Đà Nẵng, vol 23, số p.h 9C, Tháng Chín 2025, tr 81-88, doi:10.31130/ud-jst.2025.23(9C).539E.

##plugins.themes.academic_pro.article.main##

Author

Từ khóa:

Tóm tắt

Tài liệu tham khảo

##plugins.themes.academic_pro.article.sidebar##

Đã Xuất bản

##plugins.themes.academic_pro.article.details##