PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Influence of missing data imputation method on the classification accuracy of the medical data

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Aim of this study is to show the dangers of filling missing data - particularly medical data. Because there are many dedicated medical expert systems and medical decision support systems, a special attention must be paid on the construction of classifiers. Medical data are almost never complete, and completion of the missing data requires a special care. The safest approach of dealing with missing data would be removing records with missing parameters and/or removing parameters that are missing in the records. Unfortunately reducing data set that is already very small is not always an option. Dangers coming out from data imputation are shown in the article, which presents the influence of selected missing data filling algorithms on the classification accuracy.
Rocznik
Tom
Strony
111--116
Opis fizyczny
Bibliogr. 11 poz., tab., wykr.
Twórcy
autor
  • University of Silesia
autor
  • University of Silesia
Bibliografia
  • [1] BERTHOLD M. R., CEBRON N., DILL F., GABRIEL T. R., KÖTTER T., MEINL T., OHL P., SIEB C., THIEL K., WISWEDEL B., KNIME: The Konstanz Information Miner, Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007), Springer, 2007.
  • [2] BREIMAN L., Random Forests, Machine Learning, 2001, Vol. 45(1), pp. 5–32.
  • [3] DIETTERICH T. G., An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization, Machine Learning, 2000, Vol. 40(2), pp. 139–157.
  • [4] FENG C., SUTHERLAND A., KING R., MUGGLETON S., HENERY F., Comparison of machine learning classifiers to statistics and neural networks, Proceedings of the Third International Workshop in Artificial Intelligence and Statistics, 1993, pp. 41–52.
  • [5] FRIEDMAN N., GEIGER D., GOLDSZMIDT M., PROVAN G., LANGLEY P., SMYTH P., Bayesian Network Classifiers, Machine Learning, 1997, pp. 131–163.
  • [6] HALL M., FRANK E., HOLMES G., PFAHRINGER B., REUTEMANN P., WITTEN I. H., The WEKA Data Mining Software: An Update, SIGKDD Explorations, 2009, Vol. 11(1).
  • [7] JOSSINET J., Variability of impedivity in normal and pathological breast tissue., Med. & Biol. Eng. & Comput, 1996, Vol. 34, pp. 346–350.
  • [8] KRAWCZYK B., WOŹNIAK M., ORCZYK T., PORWIK P., MUSIALIK J., BŁOŃSKA-FAJFROWSKA B., Classification techniques for non-invasive recognition of liver fibrosis stage, Journal of MIT, Vol. 20, 2012, pp. 121–127.
  • [9] LITTLE R. J. A, RUBIN D. B., Statistical Analysis with Missing Data, New York: John Wiley & Sons, 1987.
  • [10] SAAR-TSECHANSKY M., PROVOST F., CARUANA R., Handling missing values when applying classification models, Journal of Machine Learning Research, 2007, Vol. 8, pp. 1217–1250.
  • [11] SCHAFER J. L., Analysis of Incomplete Multivariate Data, Chapman and Hall/CRC, 1997.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-240be0d8-fdd0-481d-94ff-e96a0e2b2d26
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.