Handling class label noise in medical pattern classification systems

Sáez, J. A.; Krawczyk, B.; Woźniak, M.

Artykuł - szczegóły

Tytuł artykułu

Handling class label noise in medical pattern classification systems

Autorzy

Sáez J. A. , Krawczyk B. , Woźniak M.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Pattern classification systems play an important role in medical decision support. They allow to automatize and speed-up the data analysis process, while being able to handle complex and massive amounts of information and discover new knowledge. However, their quality is based on the classification models built, which require a training set. In supervised classification we must supply class labels to each training sample, which is usually done by domain experts or some automatic systems. As both of these approaches cannot be deemed as flawless, there is a chance that the dataset is corrupted by class noise. In such a situation, class labels are wrongly assigned to objects, which may negatively affect the classifier training process and impair the classification performance. In this contribution, we analyze the usefulness of existing tools to deal with class noise, known as noise filtering methods, in the context of medical pattern classification. The experiments carried out on several real-world medical datasets prove the importance of noise filtering as a pre-processing step and its beneficial influence on the obtained classification accuracy.

Słowa kluczowe

machine learning pattern classification class noise noise filtering decision support systems

uczenie maszynowe klasyfikacja wzorców filtracja zakłóceń filtracja szumów systemy wspomagania decyzji

Wydawca

University of Silesia, Institute of Informatics, Computer Systems Department

Czasopismo

Journal of Medical Informatics & Technologies

Rocznik

2015

Tom

Vol. 24

Strony

123--130

Opis fizyczny

Bibliogr. 26 poz., rys., tab.

Twórcy

autor

Sáez J. A.

jose.saezmunoz@pwr.edu.pl

ENGINE Centre, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland

autor

Krawczyk B.

bartosz.krawczyk@pwr.edu.pl

Department of Systems and Computer Networks, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland,

autor

Woźniak M.

michal.wozniak@pwr.edu.pl

Department of Systems and Computer Networks, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland,

Bibliografia

[1] AZAR A. T., HASSANIEN A. E. Dimensionality reduction of medical big data using neural-fuzzy classifier. Soft Comput., 2015, Vol. 19. pp. 1115–1127.
[2] BATISTA G. E. A. P. A., MONARD M. C. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 2003, Vol. 17. pp. 519–533.
[3] BRODLEY C. E., FRIEDL M. A. Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research, 1999, Vol. 11. pp. 131–167.
[4] CZARNECKI W. M. Weighted tanimoto extreme learning machine with case study in drug discovery. IEEE Comp. Int. Mag., 2015, Vol. 10. pp. 19–29.
[5] DEVIJVER P. On the editing rate of the MULTIEDIT algorithm. Pattern Recognition Letters, 1986, Vol. 4. pp. 9–12.
[6] GARCIA L. P. F., DE CARVALHO A. C. P. L. F., LORENA A. C. Effect of label noise in the complexity of classification problems. Neurocomputing, 2015, Vol. 160. pp. 108–119.
[7] HUANG G., ZHANG Y., CAO J., STEYN M., TARAPOREWALLA K. On line mining abnormal period patterns from multiple medical sensor data streams. World Wide Web, 2014, Vol. 17. pp. 569–587.
[8] KHOSHGOFTAAR T. M., REBOURS P. Improving software quality prediction by noise filtering techniques. Journal of Computer Science and Technology, 2007, Vol. 22. pp. 387–396.
[9] KONONENKO I. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 2001, Vol. 23. pp. 89–109.
[10] KRAWCZYK B., FILIPCZUK P. Cytological image analysis with firefly nuclei detection and hybrid one-class classification decomposition. Engineering Applications of Artificial Intelligence, 2014, Vol. 31. pp. 126–135.
[11] KRAWCZYK B., SCHAEFER G. A hybrid classifier committee for analysing asymmetry features in breast thermograms. Appl. Soft Comput., 2014, Vol. 20. pp. 112–118.
[12] KRAWCZYK B., WO´ZNIAK M. Hypertension type classification using hierarchical ensemble of one-class classifiers for imbalanced data. ICT Innovations 2014, 2015, Vol. 311 of Advances in Intelligent Systems and Computing. pp. 341–349.
[13] LE CESSIE S., VAN HOUWELINGEN J. Ridge estimators in logistic regression. Applied Statistics, 1992, Vol. 41. pp. 191– 201.
[14] MCLACHLAN G. J. Discriminant Analysis and Statistical Pattern Recognition (Wiley Series in Probability and Statistics). 2004. Wiley-Interscience.
[15] POMBO N., ARAÚJO P., VIANA J. Knowledge discovery in clinical decision support systems for pain management: A systematic review. Artificial Intelligence in Medicine, 2014, Vol. 60. pp. 1–11.
[16] QUINLAN J. R. C4.5: programs for machine learning. 1993. Morgan Kaufmann Publishers, San Francisco, CA, USA.
[17] SÁEZ J. A., GALAR M., LUENGO J., HERRERA F. Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition. Knowl. Inf. Syst., 2014, Vol. 38. pp. 179–206.
[18] SÁEZ J. A., GALAR M., LUENGO J., HERRERA F. INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Information Fusion, 2016, Vol. 27. pp. 19–32.
[19] SÁEZ J. A., LUENGO J., HERRERA F. Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognition, 2013, Vol. 46. pp. 355–364.
[20] SÁNCHEZ J., BARANDELA R., MÁRQUES A., ALEJO R., BADENAS J. Analysis of new techniques to obtain quality training sets. Pattern Recognition Letters, 2003, Vol. 24. pp. 1015–1022.
[21] SÁNCHEZ J., PLA F., FERRI F. Prototype selection for the nearest neighbor rule through proximity graphs. Pattern Recognition Letters, 1997, Vol. 18. pp. 507–513.
[22] SANZ J., GALAR M., JURIO A., BRUGOS A., PAGOLA M., BUSTINCE H. Medical diagnosis of cardiovascular diseases using an interval-valued fuzzy rule-based classification system. Appl. Soft Comput., 2014, Vol. 20. pp. 103–111.
[23] TENG C.-M. Correcting Noisy Data. Proceedings of the Sixteenth International Conference on Machine Learning, 1999. Morgan Kaufmann Publishers, San Francisco, CA, USA, pp. 239–248.
[24] WILSON D. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems and Man and Cybernetics, 1972, Vol. 2. pp. 408–421.
[25] WILSON D. R., MARTINEZ T. R. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 1997, Vol. 6. pp. 1–34.
[26] WOLPERT D. The supervised learning no-free-lunch theorems. In Proc. 6th Online World Conference on Soft Computing in Industrial Applications, 2001. pp. 25–42.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-336e14fd-ee7c-4f15-9577-dfe10508d336