Feature projection k-NN classifier model for imbalanced and incomplete medical data

Porwik, P.; Orczyk, T.; Lewandowski, M.; Cholewa, M.

doi:10.1016/j.bbe.2016.08.002

Artykuł - szczegóły

Tytuł artykułu

Feature projection k-NN classifier model for imbalanced and incomplete medical data

Autorzy

Porwik P. , Orczyk T. , Lewandowski M. , Cholewa M.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

DOI

10.1016/j.bbe.2016.08.002

Warianty tytułu

Języki publikacji

Abstrakty

Many datasets, especially various historical medical data are incomplete. Various qualities of data can significantly hamper medical diagnosis and are bottlenecks of medical support systems. Nowadays, such systems are often used in medical diagnosis. Even great number of data can be unsuitable when data is imbalanced, missing or corrupted. In some cases these troubles can be overcome by machine learning algorithms designed for predictive modeling. Proposed approach was tested on real medical data and some benchmarks dataset form UCI repository. The liver fibrosis disease from a medical point of view is difficult to treatment and has a significant social and economic impact. Stages of liver fibrosis are diagnosed by clinical observation and evaluations, coupled with a so-called METAVIR rating scale. However, these methods may be insufficient, especially in the recognition of phase of the disease. This paper describes a newly developed algorithm to non-invasive fibrosis stage recognition using machine learning methods – a classification model based on feature projection k-NN classifier. This solution allows extracting data characteristics from the historical data which may be incomplete and may contain imbalance (unequal) sets of patients. Proposed novel solution is based on peripheral blood analysis without using any specialized biomarkers, and can be successfully included to medical diagnosis support systems and might be a powerful tool for effective estimation of liver fibrosis stages.

Słowa kluczowe

liver disease fibrosis stages computer aided diagnosis classifier features selection method

choroba wątroby etap zwłóknienia komputerowe wspomaganie diagnozy klasyfikator funkcja wyboru

Wydawca

Nałęcz Institute of Biocybernetics and Biomedical Engineering of the Polish Academy of Sciences
Elsevier

Czasopismo

Biocybernetics and Biomedical Engineering

Rocznik

2016

Tom

Vol. 36, no. 4

Strony

644--656

Opis fizyczny

Bibliogr. 28 poz., rys., tab., wykr.

Twórcy

autor

Porwik P.

Computer Systems Department, Institute of Computer Science, University of Silesia, ul. Bedzinska 39, 41-200 Sosnowiec, Poland

autor

Orczyk T.

tomasz.orczyk@us.edu.pl

Computer Systems Department, Institute of Computer Science, University of Silesia, ul. Bedzinska 39, 41-200 Sosnowiec, Poland

autor

Lewandowski M.

Computer Systems Department, Institute of Computer Science, University of Silesia, ul. Bedzinska 39, 41-200 Sosnowiec, Poland

autor

Cholewa M.

Computer Systems Department, Institute of Computer Science, University of Silesia, ul. Bedzinska 39, 41-200 Sosnowiec, Poland

Bibliografia

[1] Regev A, Berho M, Jeffers L, Milikowski C, Molina E, Pyrsopoulos N, et al. Sampling error and intraobserver variation in liver biopsy in patients with chronic HCV infection. Am J Gastroenterol 2002;97(10):2614–8.
[2] Bedossa P, Dargere D, Paradis V. Sampling variability of liver fibrosis in chronic hepatitis C. Hepatology 2003;38:1449–57.
[3] Czabanski R, Jezewski J, Matonia A, Jezewski M. Computerized analysis of fetal heart rate signals as the predictor of neonatal acidemia. Expert Syst Appl 2012;39 (15):11846–60.
[4] Krawczyk B, Woźniak M, Schaefer G. Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 2014;14C:554–62.
[5] Kuncheva LI. Combining pattern classifiers. Methods and algorithms. New Jersey: Wiley-Interscience; 2004.
[6] Orczyk T, Porwik P, Bernaś M. Medical diagnosis support system based on the ensemble of single-parameter classifiers. J Med Informatics Technol 2014;23:173–9.
[7] Sun Y, Wong AKC, Kamel MS. Classification of imbalanced data: a review. Int J Pattern Recogn Artificial Intellig 2009;23(4):687–719.
[8] Napierała K, Stefanowski J. Addressing imbalanced data with argument based rule learning. Expert Syst Appl 2015;42(24.):9468–81.
[9] Yao D, Yang J, Zhan X. An Improved random forest algorithm for class-imbalanced data classification and its application in PAD risk factors analysis. Open Electrical Electronic Eng J 2013;7(Supple 1: M7):62–70.
[10] Steinley D. Curse of dimensionality. In: Salkind N, editor. Encyclopedia of measurement and statistics. SAGE Publications; 2007. p. 210–2.
[11] Porwik P, Sosnowski M, Wesolowski T, Wrobel K. A computational assessment of a blood vessel's compliance: a procedure based on computed tomography coronary angiography. In: Corchado E, Kurzyński M, Woźniak M, editors. Hybrid artificial intelligent systems (HAIS2011). Berlin/Heidelberg: Springer; 2011. p. 428–35. 6678, Lecture Notes in Computer Science.
[12] Kurzynski M, Wolczowski A. Hetero- and Homogeneous Multiclassifier Systems Based on Competence Measure Applied to the Recognition of Hand Grasping Movements. Information Technologies in Biomedicine, vol. 4, Advances in Intelligent Systems and Computing, 284. 2014. pp. 163–74.
[13] Little R, Rubin D. Statistical analysis with missing data. John Wiley & Sons; 1987.
[14] Aha DW, Kibler D, Albert MK. Instance-based learning algorithms. Mach Learning 1991;6:37–66.
[15] Akkus A, Güvenir HA. Weighted k-nearest neighbor classification on feature projections. Proceedings of the 13th International Conference on Machine Learning; 1996. p. 12–9.
[16] Orczyk T, Porwik P. Liver fibrosis diagnosis support system using machine learning methods. Advanced computing and systems for security Advances in intelligent systems and computing, vol. 395. Springer; 2015. p. 111–21.
[17] Foster K, Koprowski R, Skufca J. Machine learning, medical diagnosis, and biomedical engineering research – commentary. BioMed Eng OnLine 2014;13:94.
[18] Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Inform Process Manage 2009;45:427–37.
[19] Arzucan O, Levent O, Tunga G. Text categorization with class-based and corpus-based keyword selection. Computer and Information Sciences – ISCIS 2005. Lecture Notes in Computer Science, 3733. 2005. pp. 606–15.
[20] Powers DMW. Evaluation: From precision, recall and f-measure to roc, informedness, markedness & correlation. J Mach Learning Technol 2011;2:37–63.
[21] Kohavi R, John GH. Wrappers for feature subset selection. Artificial Intellig 1997;97(1–2):273–324.
[22] Guyon I, Gunn S, Nikravesh M, Zadeh L. Feature extraction. Foundations and applications. Springer; 2006.
[23] Lichman M. UCI machine learning repository; 2013.
[24] Frank E, Witten IH. Generating accurate rule sets without global optimization. In: Shavlik J, editor. Fifteenth international conference on machine learning. Morgan Kaufmann; 1998. p. 144–51.
[25] Quinlan R. C4. 5: programs for machine learning. San Mateo, CA: Morgan Kaufmann Pub; 1993.
[26] John GH, Langley P. Estimating Continuous Distributions in Bayesian Classifiers. Eleventh Conference on Uncertainty in Artificial Intelligence; 1995. pp. 338–45.
[27] Bedossa P, Poynard T. An algorithm for the grading of activity in chronic hepatitis C. The metavir cooperative study group. Hepatology 1996;24:289–93.
[28] Orczyk T, Porwik P. Investigation of the impact of missing value imputation methods on the k-NN classification accuracy. In: Nunez M, Nguyen NT, Camacho D, et al., editors. Computational Collective Intelligence (ICCCI 2015). Book series: lecture notes in artificial intelligence, vol. 9330. 2015. pp. 557–65.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-695d86ce-7d9b-4d36-a1ba-46e0fb22fbca