Tytuł artykułu
Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
To address the problem of high feature dimensionality of Parkinson’s disease medical data, this paper introduces SHapley Additive exPlanations (SHAP) value for feature selection of Parkinson’s disease medical dataset. This paper combines SHAP value with four classifiers, namely deep forest (gcForest), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM) and random forest (RF), respectively. Then this paper applies them to Parkinson’s disease diagnosis. First, the classifier is used to calculate the magnitude of contribution of SHAP value to the features, then the features with significant contribution in the classification task are selected, and then the data after feature selection is used as input to classify the Parkinson’s disease dataset for diagnosis using the classifier. The experimental results show that compared to Fscore, analysis of variance (Anova-F) and mutual information (MI) feature selection methods, the four models based on SHAP-value feature selection achieved good classification results. The SHAP-gcForest model combined with gcForest achieves classification accuracy of 91.78% and F1-score of 0.945 when 150 features are selected. The SHAP-LightGBM model combined with LightGBM achieves classification accuracy and F1-score of 91.62% and 0.945 when 50 features are selected, respectively. The classification effectiveness is second only to the SHAP-gcForest model, but the SHAP-LightGBM model is more computationally efficient than the SHAP-gcForest model. Finally, the effectiveness of the proposed method is verified by comparing it with the results of existing literature. The findings demonstrate that machine learning with SHAP value feature selection method has good classification performance in the diagnosis of Parkinson’s disease, and provides a reference for physicians in the diagnosis and prevention of Parkinson’s disease.
Wydawca
Czasopismo
Rocznik
Tom
Strony
856--869
Opis fizyczny
Bibliogr. 51 poz., rys., tab., wykr.
Twórcy
autor
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
autor
- School of Mathematics and Physics, China University of Geosciences, 430074 Wuhan, China
autor
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
autor
- School of Mathematics and Physics, China University of Geosciences, Wuhan, China
Bibliografia
- [1] Zhu D, Liu GY, Lv Z, Wen SR, Bi S, Wang WZ. Inverse associations of outdoor activity and vitamin D intake with the risk of Parkinson’s disease. J Zhejiang Univ Sci B 2014;15(10):923–7.
- [2] Naranjo L, Pérez CJ, Martín J, Campos-Roca Y. A two-stage variable selection and classification approach for Parkinson’s disease detection by using voice recording replications. Comput Methods Programs Biomed 2017;142:147–56. https://doi.org/10.1016/j.cmpb.2017.02.019.
- [3] Sakar CO, Serbes G, Gunduz A, Tunc HC, Nizam H, Sakar BE, et al. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl Soft Comput 2019;74:255–63.
- [4] Tsanas A, Little MA, McSharry PE, Raming LO. Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Trans Biomed Eng 2010;57(4):884–93. https://doi.org/10.1109/TBME.2009.2036000.
- [5] Karan B, Sahu SS, Orozco-Arroyave JR. An investigation about the relationship between dysarthria level of speech and the neurological state of Parkinson’s patients. Biocybernet Biomed Eng 2022. https://doi.org/10.1016/j.bbe.2022.04.003.
- [6] Qiu YF, Guo L. Prediction of diabetic complications based on disequilibrium data. Data Anal Knowl Discov 2021;5 (02):116–28. https://doi.org/10.11925/infotech.2096-3467.2020.0353.
- [7] Deharab ED, Ghaderyan P. Graphical representation and variability quantification of handwriting signals: New tools for Parkinson’s disease detection. Biocybernet Biomed Eng 2022;42:158–72. https://doi.org/10.1016/j.bbe.2021.12.007.
- [8] Khare SK, Bajaj V, Acharya UR. Detection of Parkinson’s disease using automated tunable Q wavelet transform technique with EEG signals. Biocybernet Biomed Eng 2021;41(2):679–89. https://doi.org/10.1016/j.bbe.2021.04.008.
- [9] Karan B, Sahu SS. An improved framework for Parkinson’s disease prediction using Variational Mode Decomposition-Hilbert spectrum of speech signal. Biocybernet Biomed Eng 2021;41(2):717–32. https://doi.org/10.1016/j.bbe.2021.04.014.
- [10] AlMahadin G, Lotfi A, Carthy MM. Enhanced Parkinson’s disease tremor severity classification by combining signal processing with resampling techniques. SN Comput Sci 2022;3(1):1–21. https://doi.org/10.1007/s42979-021-00953-6.
- [11] Zhang ZJ, Sun JS, Chen BJ. Dynamic convergence differential neural network diagnosis system for Parkinson’s disease. Control Theory Appl 2021:1–7. https://doi.org/10.7641/CTA.2021.00770.
- [12] Ozgür E, Uyank HU, Enel S, Uzun L. Immunoaffinity biosensor for neurofilament light chain detection and its use in Parkinson’s diagnosis. Mater Sci Eng, B 2020;256. https://doi.org/10.1016/j.mseb.2020.114545 114545.
- [13] Sharma P, Jain R, Sharma M, Gupta D. Parkinson’s diagnosis using ant-lion optimisation algorithm. Int J Innov Comput Appl 2019;10(3-4): 138-146.https://doi.org/10.1504/IJICA.2019.103370.
- [14] Khoury N, Attal F, Amirat Y, Oukhellou L, Mohammed S. Data-driven based approach to aid parkinson’s disease diagnosis. Sensors 2019;19(2):242–68. https://doi.org/10.3390/s19020242.
- [15] Er O, Cetin O, Bascil MS, Temurtas F. A comparative study on parkinson’s disease diagnosis using neural networks and artificial immune system. J Med Imaging Health Inf 2016;6(1):264–8. https://doi.org/10.1166/jmihi.2016.1606.
- [16] Pang HZ, Yu ZY, Yu HM, Cao JB, Li YM, Guo MR, et al. Use of machine learning method on automatic classification of motor subtype of Parkinson’s disease based on multilevel indices of rs-fMRI. Parkinsonism Related Disorders 2021;90:65–72.
- [17] Su C, Hou Y, Brendel M. Comprehensively modeling heterogeneous symptom progression for Parkinson’s disease subtyping. medRxiv 2021. https://doi.org/10.1101/2021.07.18.21260731.
- [18] Kevin L, Man Z, Wang D, Cao Z. Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis. Neural Comput Appl 2013;22(3–4):457–68. https://doi.org/10.1007/s00521-012-0847-z.
- [19] Xie J, Lei J, Xie W, Shi Y, Liu X. Two-stage hybrid feature selection algorithms for diagnosing erythemato-squamous diseases. Health Inf Sci Syst 2013;1(1):1–14. https://doi.org/10.1186/2047-2501-1-10.
- [20] Wang X. Hu X.A review of feature selection in the classification of small samples with high dimension. J Comput Appl 2017;39(09):2433–8. 2448.
- [21] Hancer E, Xue B, Zhang M. Differential evolution for filter feature selection based on information theory and feature ranking. Knowl-Based Syst 2018;140:103–19. https://doi.org/10.1016/j.knosys.2017.10.028.
- [22] Ghosh M, Guha R, Sarkar R. A wrapper-filter feature selection technique based on ant colony optimization. Neural Comput Appl 2020;32(12):7839–57. https://doi.org/10.1016/j.asoc.2016.01.044.
- [23] Albashish D, Hammouri AI, Braik M, Atwan J, Sahran S. Binary biogeography-based optimization based SVM-RFE for feature selection. Appl Soft Comput 2021;101:107026. https://doi.org/10.1016/j.asoc.2020.107026.
- [24] Ghosh P, Azam S, Jonkman M, Karim A, Shamrat FMJM, Ignatious E, et al. Efficient prediction of cardiovascular disease using machine learning algorithms with relief and LASSO feature selection techniques. IEEE Access 2021;9:19304–26.
- [25] Dai Y, Guo X, Wang M, Sun Y. Feature selection method for high dimensional biomedical data based on shuffled frog leaping algorithm. Appl Res Comput 2021;38(04):1062–8. https://doi.org/10.19734/j.issn.1001-3695.2020.04.0115.
- [26] Li X, Zhang J, Safara F. Improving the accuracy of diabetes diagnosis applications through a hybrid feature selection algorithm. Neural Process Lett 2021:11–7. https://doi.org/10.1007/S11063-021-10491-0.
- [27] Lundberg S, Lee SI. A Unified Approach to Interpreting Model Predictions. 31st Conference on Neural In-formation Processing Systems. Long Beach, 2017.
- [28] Bi Y, Xiang D, Ge Z, Li F, Jia C, Song J. An iterpretable pediction model for identifying N7-methylguanosine sites based on XGBoost and SHAP. Mol Ther-Nucl Acids 2020;22:362–72. https://doi.org/10.1016/J.OMTN.2020.08.022.
- [29] Marcílio WE, Eler DM. From explanations to feature selection: assessing SHAP value as feature selection mechanism. 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images. Brazil 2020;340–347. https://doi.org/10.1109/SIBGRAPI51738.2020.00053.
- [30] Oh S, Park Y, Cho KJ, Kim SJ. Explainable machine learning model for glaucoma diagnosis and its interpretation. Diagnostics 2021;11(3):510–3. https://doi.org/10.3390/diagnostics11030510.
- [31] Zhang Y, Yang D, Liu Z, Chen C, Ge M, Li X, et al. An explainable supervised machine learning predictor of acute kidney injury after adult deceased donor liver transplantation. J Transl Med 2021;19(1). https://doi.org/10.1186/s12967-021-02990-4.
- [32] Rashed-Al-Mahfuz Md, Moni MA, Lio’ P, Islam SMS, Berkovsky S, Khushi M, et al. Deep convolutional neural networks based ECG beats classification to diagnose cardiovascular conditions. Biomed Eng Lett 2021;11(2):147–62.
- [33] Hogan CA, Rajpurkar P, Sowrirajan H, Phillips NA, Le AT, Wu M, et al. Nasopharyngeal metabolomics and machine learning approach for the diagnosis of influenza. EBioMedicine 2021;71:103546. https://doi.org/10.1016/j.ebiom.2021.103546.
- [34] Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Methods Programs Biomed 2022;214:106584. https://doi.org/10.1016/j.cmpb.2021.106584.
- [35] Bloch L, Friedrich CM. Data analysis with Shapley values for automatic subject selection in Alzheimer’s disease data sets using interpretable machine learning. Alzheimer’s Res Ther 2021;13(1):1–30. https://doi.org/10.1186/s13195-021-00879-4.
- [36] Makarious MB, Leonard HL, Vitale D. Multi-modality machine learning predicting Parkinson’s disease. npj Parkinson’s Dis 2022;8(1):1–13. https://doi.org/10.1101/2021.03.05.434104.
- [37] Tarnanas I, Vlamos P, Harms DR. Can detection and prediction models for Alzheimer’s Disease be applied to Prodromal Parkinson’s Disease using explainable artificial intelligence? A brief report on Digital Neuro Signatures. Open Research Europe 2022;1:146. https://doi.org/10.12688/openreseurope.14216.2.
- [38] Pianpanit T, Lolak S, Sawangjai P, Sudhawiyangkul T, Wilaiprasitporn T. Parkinson’s disease recognition using SPECT image and interpretable AI: A tutorial. IEEE Sens J 2021;21(20):22304–16.
- [39] Zhou ZH, Feng J. Deep Forest: towards an alternative to deep neural networks 2017.
- [40] Chen T, Guestrin C. Xgboost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDDInternational Conference on Knowledge Siscovery and Data Mining. San Francisco 2016;785-794.
- [41] Ke GL, Meng Q, Finley T. LightGBM: a highly efficient gradient boosting decision tree//31st Conference on Neural Information Processing Systems. Long Beach 2017.
- [42] Biau G. Analysis of a random forests model. J Mach Learn Res 2012;13(1):1063–95. https://doi.org/10.1109/TASE.2012.2183739.
- [43] Song QJ, Jiang HY, Liu J. Feature selection based on FDA and F-score for multi-class classification. Expert Syst Appl 2017;81:22–7. https://doi.org/10.1016/j.eswa.2017.02.049.
- [44] Xie J, Zheng Q, Ji X. Integrated feature selection algorithm based on F-score and kernel extreme learning machine. J Shaanxi Normal Univ (Natural Science Edition) 2020;48 (02): 1-8. https://doi.org/10.15983/j.cnki.jsnu.2020.01.001.
- [45] Shakeela S, Shankar NS, Reddy PM. Optimal ensemble learning based on distinctive feature selection by univariate ANOVA-F statistics for IDS. Int J Electron Telecommun 2021;67(2):267–75. https://doi.org/10.24425/ijet.2021.135975.
- [46] Dhindsa A, Bhatia S, Agrawal S, Sohi BS. An improvised machine learning model based on mutual information feature selection approach for microbes classification. Entropy 2021;23(2):257–72. https://doi.org/10.3390/e23020257.
- [47] Pedregosa F, Varoquaux G, Gramfort A. Scikit-learn: machine learning in python. J Mach Learn Res 2012;12:2825–30.
- [48] Polat K. A hybrid approach to Parkinson disease classification using speech signal: the combination of smote and random forests. 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). IEEE 2019: 1-3. https://doi.org/10.1109/EBBT.2019.8741725
- [49] Xiong Y, Lu Y. Deep feature extraction from the vocal vectors using sparse autoencoders for Parkinson’s classification. IEEE Access 2020;8:27821–30. https://doi.org/10.1109/ACCESS.2020.2968177.
- [50] El-Hasnony IM, Barakat SI, Mostafa RR. Optimized ANFIS model using hybrid metaheuristic algorithms for Parkinson’s disease prediction in IoT environment. IEEE Access 2020;8:119252–70. https://doi.org/10.1109/ACCESS.2020.3005614.
- [51] Gunduz H. An efficient dimensionality reduction method using filter-based feature selection and variational autoencoders on Parkinson’s disease classification. Biomed Signal Process Control 2021;66. https://doi.org/10.1016/j.bspc.2021.102452 102452.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-a8574c3d-8a54-43ad-aa85-a6abd749f233