PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Regression SVM for incomplete data

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The use of machine learning methods in the case of incomplete data is an important task in many scientific fields, like medicine, biology, or face recognition. Typically, missing values are substituted with artificial values that are estimated from the known samples, and the classical machine learning algorithms are applied. Although this methodology is very common, it produces less informative data, because artificially generated values are treated in the same way as the known ones. In this paper, we consider a probabilistic representation of missing data, where each vector is identified with a Gaussian probability density function, modeling the uncertainty of absent attributes. This representation allows to construct an analogue of RBF kernel for incomplete data. We show that such a kernel can be successfully used in regression SVM. Experimental results confirm that our approach capture relevant information that is not captured by traditional imputation methods.
Rocznik
Tom
Strony
23--35
Opis fizyczny
Bibliogr. 24 poz., rys.
Twórcy
autor
  • Faculty of Mathematics and Computer Science Lojasiewicza 6, 30-348 Kraków
autor
  • Faculty of Mathematics and Computer Science Lojasiewicza 6, 30-348 Kraków
  • Faculty of Mathematics and Computer Science Lojasiewicza 6, 30-348 Kraków
autor
  • Faculty of Mathematics and Computer Science Lojasiewicza 6, 30-348 Kraków
Bibliografia
  • [1] Little R.J., D’Agostino R., Cohen M.L., Dickersin K., Emerson S.S., Farrar J.T., Frangakis C., Hogan J.W., Molenberghs G., Murphy S.A., et al., The prevention and treatment of missing data in clinical trials. New England Journal of Medicine, 2012, 367 (14), pp. 1355–1360.
  • [2] Acock, A.C., What to do about missing values., American Psychological Association, 2012.
  • [3] Wagner A., Wright J., Ganesh A., Zhou Z., Mobahi H., Ma Y., Toward a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34 (2), pp. 372–386.
  • [4] Garcia-Laencina P.J., Sancho-Gómez J.L., Figueiras-Vidal A.R., Pattern classification with missing data: a review. Neural Computing and Applications, 2010, 19 (2), pp. 263–282.
  • [5] McKnight P.E., McKnight K.M., Sidani S., Figueredo A.J., Missing data: A gentle introduction. Guilford Press, 2007.
  • [6] Little R.J.A., Rubin D.B., Statistical analysis with missing data. John Wiley & Sons, 2014.
  • [7] Schafer J.L., Analysis of incomplete multivariate data. CRC Press, 1997.
  • [8] Ghahramani Z., Jordan M.I., Supervised learning from incomplete data via an EM approach. In: Advances in Neural Information Processing Systems, Citeseer, 1994, pp. 120–127.
  • [9] Azur M.J., Stuart E.A., Frangakis C., Leaf P.J., Multiple imputation by chained equations: what is it and how does it work? International journal of methods in psychiatric research, 2011, 20 (1), pp. 40–49.
  • [10] Williams D., Liao X., Xue Y., Carin L., Incomplete-data classification using logistic regression. In: Proceedings of the International Conference on Machine Learning, ACM, 2005, pp. 972–979.
  • [11] Smola A.J., Vishwanathan S., Hofmann T., Kernel methods for missing variables. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, Citeseer, 2005.
  • [12] Williams D., Carin L., Analytical kernel matrix completion with incomplete multiview data. In: Proceedings of the ICML Workshop on Learning With Multiple Views, 2005.
  • [13] Shivaswamy P.K., Bhattacharyya C., Smola A.J., Second order cone programming approaches for handling missing and uncertain data. Journal of Machine Learning Research, 2006, 7, pp. 1283–1314.
  • [14] Chechik G., Heitz G., Elidan G., Abbeel P., Koller D., Max-margin classification of data with absent features. Journal of Machine Learning Research, 2008, 9, pp. 1–21.
  • [15] Grangier D., Melvin I., Feature set embedding for incomplete data. In: Advances in Neural Information Processing Systems, 2010, pp. 793–801.
  • [16] Struski L., Śmieja M., Tabor J., Incomplete data representation for SVM classification. https://arxiv.org/abs/1612.01480, 2016.
  • [17] Smola A.J., Scholkopf B., A tutorial on support vector regression. Statistics and computing, 2004, 14 (3), pp. 199–222.
  • [18] Asuncion A., Newman D.J., UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/, 2007.
  • [19] Batista G.E., Monard M.C., An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 2003, 17 (5-6), pp. 519–533.
  • [20] Li D., Deogun J., Spaulding W., Shuart B., Towards missing data imputation: a study of fuzzy k-means clustering method. In: International Conference on Rough Sets and Current Trends in Computing, Springer, 2004, pp. 573–579.
  • [21] Honghai F., Guoshun C., Cheng Y., Bingru Y., Yumei C., A SVM regression based approach to filling in missing values. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Springer, 2005, pp. 581–587.
  • [22] Luengo J., Garcia S., Herrera F., A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: The good synergy between rbfns and eventcovering method. Neural Networks, 2010, 23 (3), pp. 406–418.
  • [23] Alcala-Fdez J., Sanchez L., Garcia S., del Jesus M.J., Ventura S., Garrell J.M., Otero J., Romero C., Bacardit J., Rivas V.M., et al., Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 2009, 3 (3), pp. 307–318.
  • [24] Demsar J., Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research, 2006, 7 (Jan),
Uwagi
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2018).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-d102c2f7-f045-45b5-a740-9ed02674e301
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.