Experiments on software error prediction using Decision Tree and Random Forest algorithms

Bluemke, Ilona; Borsukiewicz, Paweł

doi:10.15439/2023F363

Artykuł - szczegóły

Tytuł artykułu

Experiments on software error prediction using Decision Tree and Random Forest algorithms

Autorzy

Bluemke Ilona , Borsukiewicz Paweł

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2023F363

Warianty tytułu

Języki publikacji

Abstrakty

Machine learning algorithms are widely used in the assessment of error-proneness in software. We conducted several experiments with error prediction on public PROMISE repository. We used Decision Tree and Random Forest algorithms. We also examined techniques aiming at the improvement of performance and accuracy of the model - such as oversampling, hyperparameter optimization or threshold adjustment. The outcome of our experiments suggests that Random Forest algorithm, with 100 - 1000 trees, can be used to obtain high values of evaluation parameters such as accuracy and balanced accuracy. However, it has to be implemented with a set of techniques countering imbalance of the datasets used to assure high values of precision and recall that correspond with correct detection of erroneous software. Additionally, it was shown that the oversampling and hyperparameter optimization could be reliably applied to the algorithm, while threshold adjustment technique was not found to be consistent.

Słowa kluczowe

error prediction error proneness decision tree random forest PROMISE repository machine learning

przewidywanie błędu podatność na błędy drzewo decyzyjne las losowy repozytorium PROMISE uczenie maszynowe

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2023

Tom

Vol. 35

Strony

865--869

Opis fizyczny

Bibliogr. 33 poz., il., tab., wykr.

Twórcy

autor

Bluemke Ilona

Ilona.Bluemke@pw.edu.pl

Warsaw University of Technology, Institute of Computer Science, Nowowiejska 15/19 00-665 Warsaw, Poland

https://orcid.org/0000-0002-2894-5976

autor

Borsukiewicz Paweł

pborsukiewicz99@gmail.com

Warsaw University of Technology, Institute of Computer Science, Nowowiejska 15/19 00-665 Warsaw, Poland

https://orcid.org/00000-0002-2934-6115

Bibliografia

1. F. Elberzhager, A. R. Rosbach, Eschbach, J. Münch, “Reducing Test Effort: A Systematic Mapping Study on Existing Approaches”, Information and Software Technology, vol. 54, no. 10, 1092-1106, 2012.
2. K. Bareja, A. Singhal, “A Review of Estimation Techniques to Reduce Testing Efforts in Software Development”, http://dx.doi.org/ 10.1109/ACCT.2015.110, 2015.
3. J. Hryszko, L. Madeyski, “Cost Eﬀectiveness of Software Defect Prediction in an Industrial Project”, http://dx.doi.org/ 10.1515/fcds-2018-0002, 2018.
4. Y.Z. Bala, P.A. Samat, K.Y. Sharif, N. Manshor, “Current Software Defect Prediction: A Systematic Review”, http://dx.doi.org/ 10.1109/AiIC54368.2022.99114586, 2022
5. F. Matloob et al., “Software Defect Prediction Using Ensemble Learning: A Systematic Literature Review”, http://dx.doi.org/ 0.1109/ACCESS.2021.3095559, 2021.
6. Y. Zhao, K. Damevski, H,Chen, “A Systematic Survey of Just-in-Time Software Defect Prediction”, http://dx.doi.org/ 10.1145/3567550, 2023.
7. T. Menzies , J. DiStefano, A. Orrego , R. Chapman, “ Assessing predictors of software defects”, in Proc Predictive software models workshop, pp. 1-5, 2004.
8. G. Boetticher, T. Menzies, T. Ostrand, PROMISE Repository of Empirical Software Engineering Data, West Virginia University, Department of Computer Science 2007.
9. C. Catal, B. Diri, B. Ozumut, “An artificial immune system approach for fault prediction in object oriented software”, pp. 238-245, http://dx.doi.org/ 10.1109/DEPCOS-RELCOMEX, 2007.
10. C. Catal, B. Diri, “Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem”, http://dx.doi.org/ 10.1016/j.ins.2008.12.001, 2009.
11. J. Brownlee, “Clonal selection theory & CLONALG. The clonal selection classiﬁcation algorithm”, in Technical Report 2-02, Swinburne University of Technology, 2005.
12. J. H. Carter, “The immune system as a model for pattern recognition and classiﬁcation”, http://dx.doi.org/10.1136/jamia.2000.0070028, 2001.
13. L. Breiman, “Bagging predictors.”, Mach Learn 24, pp.123–140, https://doi.org/10.1007/BF00058655Y, 1996.
14. D. Mundada, A. Murade, O. Vaidya, and J. N. Swathi, “Software Fault Prediction Using Artificial Neural Network And Resilient Back Propagation”, Int. J. Comput. Sci. Eng., vol. 5, no. 03, pp. 173–179, 2016.
15. Z. Xiang, L. Zhang, "Research on an Optimized C4.5 Algorithm Based on Rough Set Theory", http://dx.doi.org/ 10.1109/ICMeCG.2012.74, 2012.
16. P. Bishnu and V. Bhattacherjee, “Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm”, pp. 1146–1150, http://dx.doi.org/ 10.1109/TKDE.2011.163, 2012.
17. P. Bishnu and V. Bhattacherjee, “Outlier Detection Technique Using Quad Tree” in Proc Int’l Conf. Computer Comm. Control and Information Technology, pp. 143-148, 2009.
18. A. Okutan and O. Taner, “Software defect prediction using Bayesian networks”, http://dx.doi.org/ 10.1007/s10664-012-9218-8, 2014.
19. P. Kumudha, R. Venkatesan, “Cost-Sensitive Radial Basis Function Neural Network Classifier for Software Defect Prediction”, http://dx.doi.org/ 10.1155/2016/2401496, 2016.
20. S. Gupta, D. Gupta, “Fault Prediction using Metric Threshold Value of Object Oriented Systems”, International Journal of Engineering Science and Computing, vol. 7, no. 6, pp. 13629–13643, 2017
21. E. Erturk, E. Akcapinar, “Iterative software fault prediction with a hybrid approach”, http://dx.doi.org/ 10.1016/j.asoc.2016.08.025, 2016.
22. J. S. R. Jang, "ANFIS: adaptive-network-based fuzzy inference system", http://dx.doi.org/ 10.1109/21.256541, 1993.
23. F. Alighardashi, M. Ali, Z. Chahooki, “The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction”, pp. 5, http://dx.doi.org/10.22385/jctecs.v8i0.96, 2016.
24. C. Lakshmi Prabha, Dr.N. shivakumar “Software Defect Prediction Using Machine Learning Techniques” , Proc. of the Fourth International Conference on Trends in Electronics and Informatics, IEEE Xplore Part Number: CFP20J32-ART; ISBN: 978-1-7281-5518-0, 2020.
25. Y. Shen, S. Hu, S, Cai, M. Chen, “Software Defect Prediction based on Bayesian Optimization Random Forest”, http://dx.doi.org/ 10.1109/DSA56465.2022.00149, 2022.
26. T.F. Husin, M.R. Pribadi, Yohannes, “Implementation of LSSVM in Classification of Software Defect Prediction Data with Feature Selection”, 9th Int. Conf. on Electrical Engineering, Computer Science and Informatics (EECSI2022), pp.126-131, 2022.
27. MD.A. Jahangir, MD. A.Tajwar, W. Marma, “Intelligent Software Bug Prediction: An Empirical Approach”, http://dx.doi.org , 101109/ICREST57604.2023.10070026, 2023.
28. Python Core Team, “Python: A dynamic, open source programming language”, Python Software Foundation, accessed 28.04.2022, <https://www.python.org/>
29. C.R. Harris, K.J. Millman, S.J. van der Walt et al. “Array programming with NumPy”, Nature 585, pp. 357–362, http://dx.doi.org/ 10.1038/s41586-020-2649-2, 2020.
30. W. McKinney, “Data structures for statistical computing in python”, Proc. of the 9th Python in Science Conference, vol 445, pp. 56-61, http://dx.doi.org/ 10.25080/Majora-92bf1922-00a, 2010.
31. Pedregosa et al., “Scikit-learn: Machine Learning in Python”, Journal of Machine Learning Research 12, pp. 2825-2830, 2011.
32. G. Lematre, F. Nogueira, C. K. Áridas, “Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning”, Journal of Machine Learning Research 17, pp. 1-5, http://dx.doi.org/ 10.48550/arXiv.1609.06570, 2017.
33. N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique”, Journal of artificial intelligence research, pp. 321-357, 2002.

Uwagi

1. Thematic Tracks Short Papers

2. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-8eee7233-98ef-40ed-8a01-993835efa49e