Classification with machine learning algorithms after hybrid feature selection in imbalanced data sets

PULAT, Meryem; DEVECI KOCAKOÇ, İpek

doi:10.37190/ord240410

Ten serwis zostanie wyłączony 2025-02-11.

Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

Artykuł - szczegóły

Czasopismo

Operations Research and Decisions

2024 | 34 | 4 | 157-183

Tytuł artykułu

Classification with machine learning algorithms after hybrid feature selection in imbalanced data sets

Autorzy

PULAT Meryem , DEVECI KOCAKOÇ İpek

Treść / Zawartość

Pełne teksty:

Pobierz

Warianty tytułu

Języki publikacji

Abstrakty

The efficacy of machine learning algorithms significantly depends on the adequacy and relevance of features in the data set. Hence, feature selection precedes the classification process. In this study, a hybrid feature selection approach, integrating filter and wrapper methods was employed. This approach not only enhances classification accuracy, surpassing the results achievable with filter methods alone, but also reduces processing time compared to exclusive reliance on wrapper methods. Results indicate a general improvement in algorithm performance with the application of the hybrid feature selection approach. The study utilized the Taiwanese Bankruptcy and Statlog (German Credit Data) datasets from the UCI Machine Learning Repository. These datasets exhibit an unbalanced distribution, necessitating data preprocessing that considers this unbalance. After acknowledging the datasets’ unbalanced nature, feature selection and subsequent classification processes were executed.

Słowa kluczowe

machine learning ensemble learning classification feature selection unbalanced dataset

Wydawca

Politechnika Wrocławska. Oficyna Wydawnicza Politechniki Wrocławskiej

Czasopismo

Operations Research and Decisions

Rocznik

2024

Tom

Numer

Strony

157-183

Opis fizyczny

Twórcy

autor

PULAT Meryem

Department of Business, Fırat University, Elazığ, Turkey, mpulat@firat.edu.tr

autor

DEVECI KOCAKOÇ İpek

Department of Econometrics, Faculty of Economics and Business Administration, Dokuz Eylul University, İzmir, Turkey

Bibliografia

Almeida, S. Exploring the impact of regularization to improve bankruptcy prediction for corporations. In 2023 World Conference on Communication & Computing (WCONF) (RAIPUR, India, 2023), IEEE, pp. 1–5.
Breiman, L. Bagging predictors. Machine learning 24, 2 (1996), 123–140.
Breiman, L. Random forests. Machine learning 45, 1 (2001), 5–32.
Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J. Classification and Regression Trees. Wadsworth, 1984.
Brenes, R. F., Johannssen, A., and Chukhrova, N. An intelligent bankruptcy prediction model using a multilayer perceptron. Intelligent Systems with Applications 16 (2022), 200136.
Chen, T., and Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, CA, USA, 2016), pp. 785–794.
Cho, S. H., and Shin, K.-s. Feature-weighted counterfactual-based explanation for bankruptcy prediction. Expert Systems with Applications 216 (2023), 119390.
Friedman, J. H. Multivariate adaptive regression splines. The Annals of Statistics 19, 1 (1991), 1–67.
Grubinger, T., Zeileis, A., and Pfeiffer, K.-P. evtree: Evolutionary learning of globally optimal classification and regression trees in R. Journal of Statistical Software 61, 1 (2014), 1–29.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning 46, 1-3 (2002), 389–422.
Hall, M. Correlation-based Feature Selection for Machine Learning. PhD thesis, The University of Waikato, Hamilton, 1999.
Hussain, M., Wajid, S. K., Elzaart, A., and Berbar, M. A comparison of SVM kernel functions for breast cancer detection. In 2011 Eight International Conference Computer Graphics, Imaging and Visualization (Singapure, 2011), IEEE, pp. 145–150.
James, G., Witten, D., Hastie, T., and Tibshirani, R. An Introduction to Statistical Learning. Springer, New York, 2013.
Khemka, D., Kaippada, R., Nikhil, P. S., and Suseela, S. Machine learning based efficient bankruptcy prediction model. In 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE) (Chennai, India, 2023), IEEE, pp. 1–10.
Kira, K., and Rendell, L. A. The feature selection problem: traditional methods and a new algorithm. In AAAI’92: Proceedings of the tenth national conference on Artificial intelligence (1992), AAAI Press, pp. 129–134.
Kursa, M. B., Jankowski, A., and Rudnicki, W. R. Boruta - a system for feature selection. Fundamenta Informaticae 101, 4 (2010), 271–285.
Liang, D., Tsai, C.-F., and Wu, H.-T. The effect of feature selection on financial distress prediction. Knowledge-Based Systems 73 (2015), 289–297.
Lin, S.-W., Shiue, Y.-R., Chen, S.-C., and Cheng, H.-M. Applying enhanced data mining approaches in predicting bank performance: A case of Taiwanese commercial banks. Expert Systems with Applications 36, 9 (2009), 11543–11551.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21, 6 (1953), 1087–1092.
Mitchell, T. M. Machine Learning. MacGraw-Hill, USA, 1997.
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., and Brown, S. D. An introduction to decision tree modeling. Journal of Chemometrics 18, 6 (2004), 275–285.
Naganna S. R., and Deka, P. C. Support vector machine applications in the field of hydrology: A review. Applied Soft Computing 19 (2014), 372–386.
Nelder, J. A., and Wedderburn, R. W. M. Generalized linear models. Journal of the Royal Statistical Society. Series A: Statistics in Society 135, 3 (1972), 370–384.
Novaković, J. Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav Journal of Operations Research 21, 1 (2016), 119–135.
Patil, T. R., and Sherekar, S. S. Performance analysis of naive bayes and J48 classification algorithm for data classification. International Journal of Computer Science and Applications 6, 2 (2013), 256–261.
Quan, J., and Sun, X. Credit risk assessment using the factorization machine model with feature interactions. Humanities and Social Sciences Communications 11, 1 (2024), 234.
Quinlan, J. R. Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research 4 (1996), 77–90.
Rish, I. An empirical study of the naive bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence (Seattle, WA, USA, 2001), pp. 41–46.
Rodriguez, J. J., Kuncheva, L. I., and Alonso, C. J. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 10 (2006), 1619–1630.
Saeys, Y., Inza, I., and Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507–2517.
Schapire, R. E. The strength of weak learnability. Machine Learning 5, 2 (1990), 197–227.
Subanya, B., and Rajalaxmi, R. R. Feature selection using artificial bee colony for cardiovascular disease classification. In 2014 International Conference on Electronics and Communication Systems (ICECS) (Coimbatore, India 2014), IEEE, pp. 1–6.
Tsai, C.-F. Combining cluster analysis with classifier ensembles to predict financial distress. Information Fusion 16 (2014), 46–58.
Tsai, C.-F., Hsu, Y. F., and Yen, D. C. A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing 24 (2014), 977–984.
Vapnik, V. N. Statistical Learning Theory. John Wiley & Sons, Inc., 1998.
Verma, A., and Mehta, S. A comparative study of ensemble learning methods for classification in bioinformatics. In 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence (Noida, India, 2017), pp. 155–158.
Wang, Z., Wang, Y., and Srinivasan, R. S. A novel ensemble learning approach to support building energy use prediction. Energy and Buildings 159 (2018), 109–122.
Welikala, R. A., Fraz, M. M., Dehmeshki, J., Hoppe, A., Tah, V., Mann, S., Williamson, T. H., and Barman, S. A. Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy. Computerized Medical Imaging and Graphics 43 (2015), 64–77.
Werbos, P. Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University, Cambridge, MA, 1974.
Xiao, H., Xiao, Z., and Wang, Y. Ensemble classification based on supervised clustering for credit scoring. Applied Soft Computing 43 (2016), 73–86.
Xie, J., and Wang, C. Using support vector machines with a novel hybrid feature selection method for diagnosis of erythematosquamous diseases. Expert Systems with Applications 38, 5 (2011), 5809–5815.
Youness, G., Phan, N. U. T., and Boulakia, B. C. BootBOGS: Hands-on optimizing Grid Search in hyperparameter tuning of MLP. In AICCSA 2023: 20th ACS/IEEE International Conference on Computer Systems and Applications (2023).

Typ dokumentu

Bibliografia

Identyfikatory

DOI

10.37190/ord240410

Identyfikator YADDA

bwmeta1.element.desklight-9bc9a647-5f49-4e37-a581-61fc9b8f3014