Software Defect Prediction For Quality Evaluation Using Learning Techniques Ensemble Stacking
DOI:
https://doi.org/10.35585/inspir.v13i2.58Keywords:
Software defects, Prediction, Feature Selection, SMOTE, Hyperparameter TuningAbstract
This research aims to improve the software quality and effectiveness of zakat management by the National Amil Zakat Agency (BAZNAS) through the development of a software defect prediction model (SDPM). We used machine learning techniques and ensemble stacking approach on the "Masjid Tower" dataset containing 228 records and 34 attributes. The preprocessing process involved label encoding, feature selection with Pearson correlation, standard normalization, and the use of SMOTE to handle data imbalance. We performed hyperparameter tuning with grid search CV on Machine Learning algorithms such as Ada Boost and Gradient Boosting. The results showed that the ensemble stacking approach with a combination of Gradient Boosting, Ada Boost, Decision Tree, Bayesian Ridge, and LightGBM meta learner algorithms provided high accuracy with R2 score reaching 0.97, MAE of 0.037, and MSE of 0.006. This finding proves that the ensemble stacking approach is able to overcome the problem of software defects with accurate prediction results, provide useful guidance in the management of zakat and other software applications, and has the potential to improve software quality and the effectiveness of BAZNAS in managing zakat.
Downloads
References
Alibrahim, H., & Ludwig, S. A. (2021). Hyperparameter optimization: Comparing genetic algorithm against grid search and bayesian optimization. 2021 IEEE Congress on Evolutionary Computation (CEC), 1551–1559. https://doi.org/https://doi.org/10.1109/CEC45853.2021.9504761
Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B., & Zimmermann, T. (2019). Software engineering for machine learning: A case study. 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 291–300. https://doi.org/https://doi.org/10.1109/ICSE-SEIP.2019.00042
Aziz, M. I. A., & Susetyo, H. (2020). Dinamika Pengelolaan Zakat Oleh Negara Di Beberapa Provinsi Di Indonesia Pasca Undang-Undang Nomor 23 Tahun 2011. Jurnal Hukum & Pembangunan, 49(4), 968–977. https://doi.org/https://doi.org/10.21143/jhp.vol49.no4.2352
Bahri, E. S., & Khumaini, S. (2020). Analisis efektivitas penyaluran zakat pada badan amil zakat nasional. Al Maal: Journal of Islamic Economics and Banking, 1(2), 164–175. https://doi.org/https://doi.org/10.31000/almaal.v1i2.1878
Berrar, D., & others. (2019). Cross-Validation. https://doi.org/https://doi.org/10.1016/B978-0-12-809633-8.20349-X
Bhandari, K., Kumar, K., & Sangal, A. L. (2023). Data quality issues in software fault prediction: a systematic literature review. Artificial Intelligence Review, 56(8), 7839–7908. https://doi.org/https://doi.org/10.1007/s10462-022-10371-6
Boehm, B., Abts, C., & Chulani, S. (2000). Software development cost estimation approaches A survey. Annals of Software Engineering, 10(1–4), 177–205. https://doi.org/https://doi.org/10.1023/A:1018991717352
Botchkarev, A. (2019). A new typology design of performance metrics to measure errors in machine learning regression algorithms. Interdisciplinary Journal of Information, Knowledge, and Management, 14, 45–76. https://doi.org/https://doi.org/10.28945/4184
Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, e623. https://doi.org/https://doi.org/10.7717/peerj-cs.623
Costache, R., Arabameri, A., Blaschke, T., Pham, Q. B., Pham, B. T., Pandey, M., Arora, A., Linh, N. T. T., & Costache, I. (2021). Flash-flood potential mapping using deep learning, alternating decision trees and data provided by remote sensing sensors. Sensors, 21(1), 280. https://doi.org/https://doi.org/10.3390/s21010280
Dash, G., Kiefer, K., & Paul, J. (2021). Marketing-to-Millennials: Marketing 4.0, customer satisfaction and purchase intention. Journal of Business Research, 122, 608–620. https://doi.org/https://doi.org/10.1016/j.jbusres.2020.10.016
Dhall, D., Kaur, R., & Juneja, M. (2020). Machine learning: a review of the algorithms and its applications. Proceedings of ICRIC 2019: Recent Innovations in Computing, 47–63. https://doi.org/https://doi.org/10.1007/978-3-030-29407-6_5
Elmidaoui, S., Cheikhi, L., Idri, A., & Abran, A. (2020). Machine learning techniques for software maintainability prediction: Accuracy analysis. Journal of Computer Science and Technology, 35, 1147–1174. https://doi.org/https://doi.org/10.1007/s11390-020-9668-1
Fatmawatie, N., & Endri, E. (2022). Implementation of the principles of financial governance in service companies. Journal of Governance and Regulation, 11(4), 33–45. https://doi.org/https://doi.org/10.22495/jgrv11i4art4
Fenton, N. E., & Neil, M. (1999). A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5), 675–689. https://doi.org/https://doi.org/10.1109/32.815326
Ganggayah, M. D., Taib, N. A., Har, Y. C., Lio, P., & Dhillon, S. K. (2019). Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Medical Informatics and Decision Making, 19, 1–17. https://doi.org/https://doi.org/10.1186/s12911-019-0801-4
Garg, H., & Rani, D. (2020). Novel aggregation operators and ranking method for complex intuitionistic fuzzy sets and their applications to decision-making process. Artificial Intelligence Review, 53, 3595–3620. https://doi.org/https://doi.org/10.1007/s10462-019-09772-x
Gökhan, A., Güzeller, C. O., & Eser, M. T. (2019). The effect of the normalization method used in different sample sizes on the success of artificial neural network model. International Journal of Assessment Tools in Education, 6(2), 170–192. https://doi.org/https://doi.org/10.21449/ijate.479404
Hanson, J., Paliwal, K. K., Litfin, T., Yang, Y., & Zhou, Y. (2020). Getting to know your neighbor: protein structure prediction comes of age with contextual machine learning. Journal of Computational Biology, 27(5), 796–814. https://doi.org/https://doi.org/10.1089/cmb.2019.0193
Haryono, K., Wahyuni, E. G., & Fahreza, F. M. A. (2021). The Mapping of Mosque Community to Improve Mosque Engagement in Community. ABDIMAS: Jurnal Pengabdian Masyarakat, 4(2), 788–800. https://doi.org/https://doi.org/10.35568/abdimas.v4i2.1344
He, Q., & Pursiainen, S. (2021). An extended application ‘Brain Q’processing EEG and MEG data of finger stimulation extended from ‘Zeffiro’based on machine learning and signal processing. Cognitive Systems Research, 69, 50–66. https://doi.org/https://doi.org/10.1016/j.cogsys.2020.08.006
Hodson, T. O. (2022). Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geoscientific Model Development, 15(14), 5481–5487.
Humayun, M., Niazi, M., Jhanjhi, N. Z., Alshayeb, M., & Mahmood, S. (2020). Cyber security threats and vulnerabilities: a systematic mapping study. Arabian Journal for Science and Engineering, 45, 3171–3189. https://doi.org/https://doi.org/10.1007/s13369-019-04319-2
Hutchinson, B., Smart, A., Hanna, A., Denton, E., Greer, C., Kjartansson, O., Barnes, P., & Mitchell, M. (2021). Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 560–575. https://doi.org/https://doi.org/10.1145/3442188.3445918
Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31(3), 685–695. https://doi.org/https://doi.org/10.1007/s12525-021-00475-2
Kaushik, H., Singh, D., Kaur, M., Alshazly, H., Zaguia, A., & Hamam, H. (2021). Diabetic retinopathy diagnosis from fundus images using stacked generalization of deep models. IEEE Access, 9, 108276–108292. https://doi.org/https://doi.org/10.1109/ACCESS.2021.3101142
Konstantinov, A. V, & Utkin, L. V. (2021). Interpretable machine learning with an ensemble of gradient boosting machines. Knowledge-Based Systems, 222, 106993. https://doi.org/https://doi.org/10.1016/j.knosys.2021.106993
Kumar, P. S., Nayak, J., & Behera, H. S. (2022). Model-based Software Defect Prediction from Software Quality Characterized Code Features by using Stacking Ensemble Learning. Journal of Engineering Science & Technology Review, 15(2). https://doi.org/https://doi.org/10.25103/jestr.152.17
Logesh, R., Subramaniyaswamy, V., Malathi, D., Sivaramakrishnan, N., & Vijayakumar, V. (2020). Enhancing recommendation stability of collaborative filtering recommender system through bio-inspired clustering ensemble method. Neural Computing and Applications, 32, 2141–2164. https://doi.org/https://doi.org/10.1007/s00521-018-3891-5
Luengo, J., Garc’ia-Gil, D., Ram’irez-Gallego, S., Garc’ia, S., & Herrera, F. (2020). Big data preprocessing. Cham: Springer. https://doi.org/https://doi.org/10.1007/978-3-030-39105-8
Marinov, D., & Karapetyan, D. (2019). Hyperparameter optimisation with early termination of poor performers. 2019 11th Computer Science and Electronic Engineering (CEEC), 160–163. https://doi.org/https://doi.org/10.1109/CEEC47804.2019.8974317
Mooijman, P., Catal, C., Tekinerdogan, B., Lommen, A., & Blokland, M. (2023). The effects of data balancing approaches: A case study. Applied Soft Computing, 132, 109853. https://doi.org/https://doi.org/10.1016/j.asoc.2022.109853
Nabipour, M., Nayyeri, P., Jabani, H., Mosavi, A., & Salwana, E. (2020). Deep learning for stock market prediction. Entropy, 22(8), 840. https://doi.org/https://doi.org/10.3390/e22080840
Paleyes, A., Urma, R.-G., & Lawrence, N. D. (2022). Challenges in deploying machine learning: a survey of case studies. ACM Computing Surveys, 55(6), 1–29. https://doi.org/https://doi.org/10.1145/3533378
Perdana, R. S., & Yuhana, U. L. (2015). Prediksi Code Defect Perangkat Lunak Dengan Metode Association Rule Mining dan Cumulative Support Thresholds. Jurnal Buana Informatika, 6(2). https://doi.org/https://doi.org/10.24002/jbi.v6i2.408
Pitri, P. (2023). Strategi Pendayagunaan Zakat Produktif Di Badan Amil Zakat Nasional (Baznas) Kabupaten Bangka. Neraca: Jurnal Ekonomi, Manajemen Dan Akuntansi, 1(3), 286–300. https://doi.org/https://doi.org/10.37968/jhesy.v1i1.267
Reddivari, S., & Raman, J. (2019). Software quality prediction: an investigation based on machine learning. 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), 115–122. https://doi.org/https://doi.org/10.1109/IRI.2019.00030
Romadloni, N. T., Pardede, H. F., & others. (2019). Seleksi Fitur Berbasis Pearson Correlation Untuk Optimasi Opinion Mining Review Pelanggan. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 3(3), 505–510. https://doi.org/https://doi.org/10.29207/resti.v3i3.1189
Saheed, Y. K., Longe, O., Baba, U. A., Rakshit, S., & Vajjhala, N. R. (2021). An ensemble learning approach for software defect prediction in developing quality software product. Advances in Computing and Data Sciences: 5th International Conference, ICACDS 2021, Nashik, India, April 23--24, 2021, Revised Selected Papers, Part I 5, 317–326. https://doi.org/https://doi.org/10.1007/978-3-030-81462-5_29
Sherwani, F., Ibrahim, B., & Asad, M. M. (2021). Hybridized classification algorithms for data classification applications: A review. Egyptian Informatics Journal, 22(2), 185–192. https://doi.org/https://doi.org/10.1016/j.eij.2020.07.004
Sun, J., Li, J., & Fujita, H. (2022). Multi-class imbalanced enterprise credit evaluation based on asymmetric bagging combined with light gradient boosting machine. Applied Soft Computing, 130, 109637. https://doi.org/https://doi.org/10.1016/j.asoc.2022.109637
Thalib, I. S. (2023). Klasifikasi Sentimen Tragedi Kanjuruhan Pada Twitter Menggunakan Algoritma Naive Bayes. Klasifikasi Sentimen Tragedi Kanjuruhan Pada Twitter Menggunakan Algoritma Naive Bayes, 4(3), 467–473. https://doi.org/https://doi.org/10.30865/json.v4i3.5852
Thara, D. K., PremaSudha, B. G., & Xiong, F. (2019). Auto-detection of epileptic seizure events using deep neural network with different feature scaling techniques. Pattern Recognition Letters, 128, 544–550. https://doi.org/https://doi.org/10.1016/j.patrec.2019.10.029
Tuggener, L., Amirian, M., Rombach, K., Lörwald, S., Varlet, A., Westermann, C., & Stadelmann, T. (2019). Automated machine learning in practice: state of the art and recent results. 2019 6th Swiss Conference on Data Science (SDS), 31–36. https://doi.org/https://doi.org/10.1109/SDS.2019.00-11
Tyralis, H., & Papacharalampous, G. (2021). Boosting algorithms in energy research: A systematic review. Neural Computing and Applications, 33(21), 14101–14117. https://doi.org/https://doi.org/10.1007/s00521-021-05995-8
Wankhade, K. K., Jondhale, K. C., & Dongre, S. S. (2021). A clustering and ensemble based classifier for data stream classification. Applied Soft Computing, 102, 107076. https://doi.org/https://doi.org/10.1016/j.asoc.2020.107076
Xu, C., Wang, X., Yang, H., Xie, K., & Chen, X. (2019). Exploring the impacts of speed variances on safety performance of urban elevated expressways using GPS data. Accident Analysis & Prevention, 123, 29–38. https://doi.org/https://doi.org/10.1016/j.aap.2018.11.012
Yang, Z., Jin, C., Zhang, Y., Wang, J., Yuan, B., & Li, H. (2022). Software Defect Prediction: An Ensemble Learning Approach. Journal of Physics: Conference Series, 2171(1), 12008. https://doi.org/https://doi.org/10.1088/1742-6596/2171/1/012008
Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., & Saeed, J. (2020). A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. Journal of Applied Science and Technology Trends, 1(2), 56–70. https://doi.org/https://doi.org/10.38094/jastt1224
Zulfiker, M. S., Kabir, N., Biswas, A. A., Nazneen, T., & Uddin, M. S. (2021). An in-depth analysis of machine learning approaches to predict depression. Current Research in Behavioral Sciences, 2, 100044. https://doi.org/https://doi.org/10.1016/j.crbeha.2021.100044
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Muhammad Romadhona Kusuma Kusuma; Windu Gata, Sigit Kurniawan, Dedi Dwi Saputra, Supriadi Panggabean
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.