Feature selection using the zebra optimization algorithm for software fault prediction: a study on the bughunter dataset
Abstract: 125
|
PDF: 76
##plugins.themes.academic_pro.article.main##
Author
-
Ha Thi Minh PhuongThe University of Danang - Vietnam-Korea University of Information and Communication Technology, VietnamDao Khanh DuyThe University of Danang - Vietnam-Korea University of Information and Communication Technology, VietnamNguyen Do Anh NhuThe University of Danang – University of Economics, VietnamHoang Thi Thanh HaThe University of Danang – University of Economics, Vietnam
Keywords:
Abstract
Software fault prediction focuses on identifying software modules that are most likely to contain faults before the testing stage, helping developers allocate quality assurance resources effectively and improve system reliability. A major challenge in SFP lies in redundant and irrelevant features within software fault datasets, which often lower the accuracy of predictive models. To address this, the study introduces a wrapper-based feature selection method using the Zebra Optimization Algorithm (ZOA). Experiments on nine BugHunter datasets show that the ZOA-based method consistently surpasses a baseline deep learning model trained on raw data, achieving higher F1-score, Precision, and Recall. The findings demonstrate that ZOA is effective in reducing feature redundancy and improving prediction performance. This research confirms the potential of ZOA in SFP, offering practical benefits for software development and opening new opportunities for further studies.
References
-
[1] F. Bartumeus, M. G. E. da Luz, G. M. Viswanathan, and J. Catalan, “Animal search strategies: a quantitative random-walk analysis”, Ecology, vol. 86, no. 11, pp. 3078–3087, 2005.
[2] Z. Dang and H. Wang, “Leveraging meta-heuristic algorithms for effective software fault prediction: a comprehensive study”, Journal of Engineering and Applied Science, vol. 71, no. 1, p. 189, 2024.
[3] E. Trojovská, M. Dehghani, and P. Trojovský, “Zebra optimization algorithm: a new bio-inspired optimization algorithm for solving optimization problems”, IEEE Access, vol. 10, pp. 49445–49473, 2022.
[4] R. Ferenc, P. Gyimesi, G. Gyimesi, Z. Tóth, and T. Gyimóthy, “An automatically created novel bug dataset and its validation in bug prediction”, Journal of Systems and Software, vol. 169, p. 110691, 2020.
[5] M. Ali, T. Mazhar, T. Shahzad, Y. Y. Ghadi, S. M. Mohsin, S. M. A. Akber, and M. Ali, “Analysis of feature selection methods in software defect prediction models”, IEEE Access, vol. 11, pp. 145954–145974, 2023.
[6] A. O. Balogun, S. Basri, L. F. Capretz, S. Mahamad, A. A. Imam, M. A. Almomani, V. E. Adeyemo, A. K. Alazzawi, A. O. Bajeh, and G. Kumar, “Software defect prediction using wrapper feature selection based on dynamic re-ranking strategy”, Symmetry, vol. 13, no. 11, p. 2166, 2021.
[7] A. O. Balogun, S. Basri, S. J. Abdulkadir, and A. S. Hashim, “Performance analysis of feature selection methods in software defect prediction: a search method approach”, Applied Sciences, vol. 9, no. 13, p. 2764, 2019.
[8] A. M. Akbar, R. Herteno, S. W. Saputro, M. R. Faisal, and R. A. Nugroho, “Optimizing software defect prediction models: integrating hybrid grey wolf and particle swarm optimization for enhanced feature selection with popular gradient boosting algorithm”, Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 169–181, 2024.
[9] R. Al-Wajih, S. J. Abdulkadir, N. Aziz, Q. Al-Tashi, and N. Talpur, “Hybrid binary grey wolf with harris hawks optimizer for feature selection”, IEEE Access, vol. 9, pp. 31662–31677, 2021.
[10] O. Almomani, “A feature selection model for network intrusion detection system based on PSO, GWO, FFA and GA algorithms”, Symmetry, vol. 12, no. 6, p. 1046, 2020.
[11] N. M. Sallam, A. I. Saleh, H. A. Ali, and M. M. Abdelsalam, “An efficient strategy for blood diseases detection based on grey wolf optimization as feature selection and machine learning techniques”, Applied Sciences, vol. 12, no. 21, p. 10760, 2022.
[12] R. B. Said, Z. Sabir, and I. Askerzade, “CNN-BiLSTM: a hybrid deep learning approach for network intrusion detection system in software-defined networking with hybrid feature selection”, IEEE Access, vol. 11, pp. 138732–138747, 2023.
[13] M. Khan and R. Kishwar, “A novel software defect prediction model using two-phase grey wolf optimisation for feature selection”, no. 27, no. 9, pp. 12185–12207, 2024.
[14] R. Malhotra, S. Chawla, and A. Sharma, “Software defect prediction based on multi-filter wrapper feature selection and deep neural network with attention mechanism”, Neural Computing and Applications, vol. 37, pp. 22621–22648, 2025.
[15] S. C. Rathi, S. Misra, R. Colomo-Palacios, R. Adarsh, L. B. M. Neti, and L. Kumar, “Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction”, Expert Systems with Applications, vol. 223, p. 119806, 2023.
[16] Y. Liu, “An improved zebra optimization algorithm”, International Journal of Engineering Research and Management (IJERM), vol. 12, no. 3, pp. 86–90, 2025.
[17] M. P. LaValley, “Logistic regression”, Circulation, vol. 117, no. 18, pp. 2395–2399, 2008.
[18] N. V. Thieu and S. Mirjalili, “Mealpy: An open-source library for latest meta-heuristic algorithms in Python”, Journal of Systems Architecture, vol. 139, p. 102871, 2023.
[19] S. Balasubramaniam and S. G. Gollagi, “Software defect prediction via optimal trained convolutional neural network”, Advances in Engineering Software, vol. 169, p. 103138, 2022.
[20] E. Borandag, “Software fault prediction using an RNN-based deep learning approach and ensemble machine learning techniques”, Applied Sciences, vol. 13, no. 3, p. 1639, 2023.
[21] M. R. Islam, M. Begum, and M. N. Akhtar, “Recursive approach for multiple step-ahead software fault prediction through long short-term memory (LSTM)”, Journal of Discrete Mathematical Sciences and Cryptography, vol. 25, no. 7, pp. 2129–2138, 2022.

