Comparison of Algorithms for Clustering Incomplete Data

Matyja, A; Siminski, K

doi:10.2478/fcds-2014-0007

Artykuł - szczegóły

Tytuł artykułu

Comparison of Algorithms for Clustering Incomplete Data

Autorzy

Matyja A , Siminski K

Wybrane pełne teksty z tego czasopisma

Identyfikatory

DOI

10.2478/fcds-2014-0007

Warianty tytułu

Języki publikacji

Abstrakty

The missing values are not uncommon in real data sets. The algorithms and methods used for the data analysis of complete data sets cannot always be applied to missing value data. In order to use the existing methods for complete data, the missing value data sets are preprocessed. The other solution to this problem is creation of new algorithms dedicated to missing value data sets. The objective of our research is to compare the preprocessing techniques and specialised algorithms and to find their most advantageous usage.

Słowa kluczowe

clustering incomplete data missing value marginalisation imputation IFCM OCS NPS NCS

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2014

Tom

Vol. 39, No. 2

Strony

107--127

Opis fizyczny

Bibliogr. 27 poz.

Twórcy

autor

Matyja A

Institute of Informatics, Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland

autor

Siminski K

Institute of Informatics, Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland

Bibliografia

[1] Acuña E., Rodriguez C., The treatment of missing values and its effect in the classifier accuracy. In Banks D., House L., McMorris F. R., Arabie P., Gaul W. (eds.), editors, Classification, Clustering and Data Mining Applications, Springer, Berlin, Heidelberg, 2004, 639-648.
[2] Alcalá-Fdez J., Fernandez A., Luengo J., Derrac J., García S., Sánchez L., Herrera F., KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17, 2-3, 2011, 255-287.
[3] Bensaid A. M., Hall L. O., Bezdek J. C., Clarke L. P., Silbiger M. L., Arrington J. A., R. F. Murtagh, Validity-guided (re)clustering with applications to image segmentation. Transactions on Fuzzy Systems, 4, 2, 1996, 112-123.
[4] Chan L., Gilman J., Dunn O., Alternative approaches to missing values in discriminant analysis. Journal of the American Statistical Association, 71, 356, 1976, 842-844.
[5] Czogała E., Łeski J., Fuzzy and Neuro-Fuzzy Intelligent Systems. Series in Fuzziness and Soft Computing. Physica-Verlag, A Springer-Verlag Company, Heidelberg, New York, 2000.
[6] Dempster A. P., Laird N. M., Rubin D. B., Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1, 1977, 1-38.
[7] Dunn J. C., A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters. Journal Cybernetics, 3, 3, 1973, 32-57.
[8] Ghahramani Z, Jordan M. I., Learning from incomplete data. Technical report, Lab Memo No. 1509, CBCL Paper No. 108, MIT AI Lab, 1995.
[9] Grzymała-Busse J., Hu M., A comparison of several approaches to missing attribute values in data mining. In Ziarko W. and Yao Y. (eds), Rough Sets and Current Trends in Computing, volume 2005 of Lecture Notes in Computer Science, 378-385. Springer Berlin / Heidelberg, 2001.
[10] Grzymała-Busse J., Grzymała-Busse W., Handling missing attribute values. In Maimon O., Rokach L. (eds), The Data Mining and Knowledge Discovery Handbook, 37-57. Springer, 2005.
[11] Grzymala-Busse J., Siddhaye S., Rough set approaches to rule induction from incomplete data. In Proceedings of the IPMU’2004, the 10th International Conference on Information Processing and Management of Uncertainty in Knowledge- Based Systems, volume 2, pages 923-930, 2004.
[12] Gustafson D., Kessel W., Fuzzy clustering with a fuzzy covariance matrix. In Decision and Control including the 17th Symposium on Adaptive Processes, 1978 IEEE Conference on, volume 17, pages 761-766, 1978.
[13] Hathaway R. J., Bezdek J. C., Fuzzy c-means clustering of incomplete data. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 31, 5, 2001, 735-744.
[14] Himmelspach L., Conrad S., Fuzzy clustering of incomplete data based on cluster dispersion. In Hüllermeier E., Kruse R., Hoffmann F. (eds), Computational Intelligence for Knowledge-Based Systems Design, volume 6178 of Lecture Notes in Computer Science, pages 59-68. Springer Berlin / Heidelberg, 2010.
[15] Little R. J., Rubin D. B., Statistical analysis with missing data. John Wiley and Sons, New York, 1987.
[16] Pal N. R., Bezdek J. C., On cluster validity for the fuzzy c-means model. IEEE Transactions on Fuzzy Systems, 3, 3, 1995, 370-379.
[17] Pal S. K., Majumder D. D., Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Transactions on Systems, Man, and Cybernetics, 7, 1977, 625-629.
[18] Renz C., Rajapakse J., Razvi K., Liang S., Ovarian cancer classification with missing data. In Proceedings of the 9th International Conference on Neural Information Processing, ICONIP’02, volume 2, pages 809-813, Singapore, 2002.
[19] Siminski K., Neuro-rough-fuzzy approach for regression modelling from missing data. International Journal of Applied Mathematics and Computer Science, 22, 2, 2012, 461-476.
[20] Siminski K., Clustering with missing values. Fundamenta Informaticae, 123, 3, 2013, 331-350.
[21] Timm H., Kruse R., Fuzzy cluster analysis with missing values. In NAFIPS 1998 Conference of the North American Fuzzy Information Processing Society, pages 242-246, 1998.
[22] Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R., Missing value estimation methods for DNA microarrays. Bioinformatics, 17, 6, 2001, 520-525.
[23] Wagstaff K., Laidler V., Making the most of missing values: Object clustering with partial data in astronomy. In Proceedings of Astronomical Data Analysis Software and Systems XIV, volume 347, pages 172-176, Pasadena, California, USA, 2005.
[24] Xie X., Beni G., A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 8, 1991, 841-847.
[25] Yao L., Weng K., Chang R., Fuzzy classification of incomplete data with adaptive volume. In ACIIDS ’09: Proceedings of the 2009 First Asian Conference on Intelligent Information and Database Systems, pages 232-237, Washington, DC, USA, 2009.
[26] Zhang C., Zhu X., Zhang J., Qin Y., Zhang S., GBKII: An imputation method for missing values. Advances in Knowledge Discovery and Data Mining, 4426, 2007, 1080-1087.
[27] Zhang S., Shell-neighbor method and its application in missing data imputation. Applied Intelligence, 35, 1, 2011, 1-11.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-6d78e768-6c60-4879-8803-a24254ef5788