PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Prediction of Missing Values in Adult Data Set of UCI Machine Learning : A Case of Study

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
These days, not having complete data of any kind can be a big problem for different organizations when making decisions. In this article, we propose to use Shannon entropy and information gain to predict and impute missing categorical data in any data set. It is detailed with an example of how entropy is applied and knows the level of uncertainty of each attribute value. Likewise, the imputation of the missing attributes is also carried out with other imputation techniques in the Adult data set of UCI Machine Learning to denote the advantages offered by the proposed methodology.
Rocznik
Tom
Strony
7--21
Opis fizyczny
Bibliogr. 15 poz., tab.
Twórcy
  • Tecnológico Nacional de México, Instituto Tecnológico de Apizaco
autor
  • Tecnológico Nacional de México, Instituto Tecnológico de Apizaco
  • Tecnológico Nacional de México, Instituto Tecnológico de Apizaco
  • Tecnológico Nacional de México, Instituto Tecnológico de Apizaco
Bibliografia
  • [1] B. D. Romo, “Ajuste demografico por imputacion,”Reality, Data and Space International Journal of Statistics and Geography, vol. 9, pp. 27-60, 2018.
  • [2] R. M. Gray, Entropy and Information Theory. Springer New York, Dordrecht Heidelberg London, 2011.
  • [3] C. A. Gonzalo, Teorıa de la informacion, codificacion y lenguajes.Ministerio de Educacion y Ciencia, 1975.
  • [4] I. Myrtveit, E. Stensrud, and U. H. Olsson, “Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods,”IEEE Transactions on Software Engineering, vol. 27, no. 11, pp. 999-1013, 2001.
  • [5] M. P. de Albuquerque, I. A. Esquef, and M. P. de Albuquerque, “Image segmentation using non extensive relative entropy,”IEEE Latin America Transactions, vol. 6, no. 5, pp.477-483, 2008.
  • [6] G. Chhabra, V. Vashisht, and J. Ranjan, “A Comparison of Multiple Imputation Methods for Data with Missing Values,”Indian Journal of Science and Technology, vol. 10, no. 19,pp. 1-7, 2017.
  • [7] O. Lüdtke, A. Robitzsch, and S. Grund, “Multiple imputation of missing data in multilevel designs: A comparison of different strategies,”Psychological methods, vol. 22, no. 1, pp.141-165, 2017.
  • [8] A. B. Pedersen, E. M. Mikkelsen, D. Cronin-Fenton, N. R. Kristensen, T. M. Pham, L. Pedersen, and I. Petersen, “Missing Data and Multiple Imputation in Clinical Epidemiological Research,”Clinical Epidemiology, vol. 9, pp. 157-166, 2017.
  • [9] S. Rawal, S. C. Gupta, and S. Singh, “A Proposal for Predicting Missing Values in a Dataset Using Supervised Learning,”International Journal of Advanced Research in Computer Science, vol. 8, no. 8, pp. 562-567, 2017.
  • [10] R. Kohavi and B. Becker, “UCI Machine Learning Repository: Adult Data Set,”Avaliable: https://archive.ics.uci.edu/ml/machine-learning-databases/adult/, 1996, [On-line. Accessed on 15-12-2019].
  • [11] T. Raghunathan, P. A. Berglund, and P. W. Solenberger, Multiple Imputation in Practice:With Examples Using IV Eware. Chapman and Hall/CRC, 2018.
  • [12] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA Data Mining Software: An Update,”ACM SIGKDD explorations newsletter, vol. 11, no. 1, pp. 10-18, 2009.
  • [13] T. Aljuaid and S. Sasi, “Proper Imputation Techniques for Missing Values in Data Sets,” in2016 International Conference on Data Science and Engineering (ICDSE). IEEE, 2016, pp. 1-5.
  • [14] D. T. Larose and C. D. Larose,Data Mining and Predictive Analytics. Wiley, 2015.
  • [15] J. Poulos and R. Valle, “Missing Data Imputation for Supervised Learning,”Applied Artificial Intelligence, vol. 32, no. 2, pp. 186-196, 2018.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-55717e0d-052a-4c97-bf28-ffdff8860db1
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.