Influence of missing data imputation method on the classification accuracy of the medical data

Orczyk, T.; Porwik, P.

Artykuł - szczegóły

Tytuł artykułu

Influence of missing data imputation method on the classification accuracy of the medical data

Autorzy

Orczyk T. , Porwik P.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Aim of this study is to show the dangers of filling missing data - particularly medical data. Because there are many dedicated medical expert systems and medical decision support systems, a special attention must be paid on the construction of classifiers. Medical data are almost never complete, and completion of the missing data requires a special care. The safest approach of dealing with missing data would be removing records with missing parameters and/or removing parameters that are missing in the records. Unfortunately reducing data set that is already very small is not always an option. Dangers coming out from data imputation are shown in the article, which presents the influence of selected missing data filling algorithms on the classification accuracy.

Słowa kluczowe

medical data analysis missing data data imputation classification efficiency

analiza danych medycznych brakujące dane przypisanie danych efektywność klasyfikacji

Wydawca

University of Silesia, Institute of Informatics, Computer Systems Department

Czasopismo

Journal of Medical Informatics & Technologies

Rocznik

2013

Tom

Vol. 22

Strony

111--116

Opis fizyczny

Bibliogr. 11 poz., tab., wykr.

Twórcy

autor

Orczyk T.

tomasz.orczyk@us.edu.pl

University of Silesia

autor

Porwik P.

piotr.porwik@us.edu.pl

University of Silesia

Bibliografia

[1] BERTHOLD M. R., CEBRON N., DILL F., GABRIEL T. R., KÖTTER T., MEINL T., OHL P., SIEB C., THIEL K., WISWEDEL B., KNIME: The Konstanz Information Miner, Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007), Springer, 2007.
[2] BREIMAN L., Random Forests, Machine Learning, 2001, Vol. 45(1), pp. 5–32.
[3] DIETTERICH T. G., An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization, Machine Learning, 2000, Vol. 40(2), pp. 139–157.
[4] FENG C., SUTHERLAND A., KING R., MUGGLETON S., HENERY F., Comparison of machine learning classifiers to statistics and neural networks, Proceedings of the Third International Workshop in Artificial Intelligence and Statistics, 1993, pp. 41–52.
[5] FRIEDMAN N., GEIGER D., GOLDSZMIDT M., PROVAN G., LANGLEY P., SMYTH P., Bayesian Network Classifiers, Machine Learning, 1997, pp. 131–163.
[6] HALL M., FRANK E., HOLMES G., PFAHRINGER B., REUTEMANN P., WITTEN I. H., The WEKA Data Mining Software: An Update, SIGKDD Explorations, 2009, Vol. 11(1).
[7] JOSSINET J., Variability of impedivity in normal and pathological breast tissue., Med. & Biol. Eng. & Comput, 1996, Vol. 34, pp. 346–350.
[8] KRAWCZYK B., WOŹNIAK M., ORCZYK T., PORWIK P., MUSIALIK J., BŁOŃSKA-FAJFROWSKA B., Classification techniques for non-invasive recognition of liver fibrosis stage, Journal of MIT, Vol. 20, 2012, pp. 121–127.
[9] LITTLE R. J. A, RUBIN D. B., Statistical Analysis with Missing Data, New York: John Wiley & Sons, 1987.
[10] SAAR-TSECHANSKY M., PROVOST F., CARUANA R., Handling missing values when applying classification models, Journal of Machine Learning Research, 2007, Vol. 8, pp. 1217–1250.
[11] SCHAFER J. L., Analysis of Incomplete Multivariate Data, Chapman and Hall/CRC, 1997.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-240be0d8-fdd0-481d-94ff-e96a0e2b2d26