Comparison of incomplete data handling techniques for neuro-fuzzy systems

Sikora, M.; Simiński, K.

doi:10.7494/csci.2014.15.4.441

Artykuł - szczegóły

Tytuł artykułu

Comparison of incomplete data handling techniques for neuro-fuzzy systems

Autorzy

Sikora M. , Simiński K.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2014.15.4.441

Warianty tytułu

Języki publikacji

Abstrakty

Real-life data sets sometimes miss some values. The incomplete data needs specialized algorithms or preprocessing that allows the use of the algorithms for complete data. The paper presents a comparison of various techniques for handling incomplete data in the neuro-fuzzy system ANNBFIS. The crucial procedure in the creation of a fuzzy model for the neuro-fuzzy system is the partition of the input domain. The most popular approach (also used in the ANNBFIS) is clustering. The analyzed approaches for clustering incomplete data are: preprocessing (marginalization and imputation) and specialized clustering algorithms (PDS, IFCM, OCS, NPS). The objective of our research is the comparison of the preprocessing techniques and specialized clustering algorithms to find the the most-advantageous technique for handling incomplete data with a neuro-fuzzy system. This approach is also the indirect validation of clustering.

Słowa kluczowe

incomplete data marginalization imputation neuro-fuzzy system ANNBFIS PDS IFCM OCS NPS

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2014

Tom

Vol. 15 (4)

Strony

441--458

Opis fizyczny

Bibliogr. 37 poz., rys., tab.

Twórcy

autor

Sikora M.

Independent researcher

autor

Simiński K.

Krzysztof.Siminski@polsl.pl

Silesian University of Technology, Faculty of Automatic Control, Electronics and Computer Science

Bibliografia

[1] Acu ̃na E., Rodriguez C.: The treatment of missing values and its effect in the classifier accuracy. In: D. Banks, L. House, F. McMorris, P. Arabie, W. G. (eds.), Classification, Clustering and Data Mining Applications, Springer, Berlin, Heidelberg, pp. 639–648. 2004.
[2] Bensaid A. M., Hall L. O., Bezdek J. C., Clarke L. P., Silbiger M. L., Arrington J. A., Murtagh R. F.: Validity-guided (re)clustering with applications to image segmentation. In: Transactions on Fuzzy Systems, vol. 4(2), pp. 112–123, 1996. ISSN 1063-6706.
[3] Box G. E. P., Jenkins G.: Time Series Analysis, Forecasting and Control. Holden-Day, Incorporated, Oakland, California, 1970.
[4] Cooke M., Green P., Josifovski L., Vizinho A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, vol. 34, pp. 267–285, 2001. URLhttp://dx.doi.org/10.1016/S0167-6393(00)00034-0.
[5] Czekalski P.: Evolution-Fuzzy Rule Based System with parameterized consequences. International Journal of Applied Mathematics and Computer Science, vol. 16(3), pp. 373–385, 2006.
[6] Czogała E., Łęski J.: Fuzzy and Neuro-Fuzzy Intelligent Systems. Series in Fuzziness and Soft Computing. Physica-Verlag, Springer-Verlag Company, Heidelberg, New York, 2000.
[7] Dunn J.C.: A Fuzzy Relative of the ISODATA Process and its Use in Detecting Compact, Well Separated Clusters. Journal Cybernetics, vol. 3(3), pp. 32–57, 1973.
[8] Ghahramani Z., Jordan M.: Learning From Incomplete Data. Tech. rep., Lab Memo No. 1509, CBCL Paper No. 108, MIT AI Lab, 1995.
[9] Grzymała-Busse J., Goodwin L., Grzymala-Busse W., Zheng X.: Handling Missing Attribute Values in Preterm Birth Data Sets. D. Slezak, J. Yao, J. Peters, W. Ziarko, X. Hu, (eds.), Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, Lecture Notes in Computer Science, vol. 3642, pp. 342–351. Springer Berlin / Heidelberg, 2005. ISBN 978-3-540-28660-8.
[10] Grzymała-Busse J., Hu M.: A Comparison of Several Approaches to Missing Attribute Values in Data Mining. In: W. Ziarko, Y. Yao, (eds.), Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, vol. 2005, pp.378–385. Springer Berlin / Heidelberg, 2001. ISBN 978-3-540-43074-2
[11] Hathaway R., Bezdek J.: Fuzzy c-means clustering of incomplete data. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 31(5), pp. 735–744, 2001. ISSN 1083-4419. URL http://dx.doi.org/10.1109/3477.956035.
[12] Jang J. S. R.: ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE Transactions on Systems, Man, and Cybernetics, vol. 23(3), pp. 665–684, 1993.
[13] Kalton G., Kasprzyk D.: The treatment of missing survey data. Survey Methodology, vol. 12, pp. 1–16, 1986.
[14] Łęski J.: Systemy neuronowo-rozmyte (Neuro-fuzzy systems). Wydawnictwa Naukowo-Techniczne, Warszawa, 2008. ISBN 978-83-204-3229-9.
[15] Łęski J., Czogała E.: A new artificial neural network based fuzzy inference system with moving consequents in if-then rules and selected applications. Fuzzy Sets and Systems, vol. 108(3), pp. 289–297, 1999. ISSN 0165-0114. URL http://dx.doi.org/10.1016/S0165-0114(97)00314-X.
[16] Mackey M. C., Glass L.: Oscillation and chaos in physiological control systems. Science, vol. 197(4300), pp. 287–289, 1977.
[17] Matyja A., Simiński K.: Comparison of algorithms for clustering incomplete data. Foundations of Computing and Decision Sciences, vol. 39(2), pp. 107–127, 2014. URL http://dx.doi.org/10.2478/fcds-2014-0007.
[18] Mundfrom D.J., Whitcomb A.: Imputing Missing Values: The Effect on the Accuracy of Classification. Multiple Linear Regression Viewpoints, vol. 25(1), pp.13–19, 1998.
[19] Nelles O., Fink A., Babuˇska R., Setnes M.: Comparison of Two Construction Algorithms for Takagi-Sugeno Fuzzy Models. International Journal of Applied Mathematics and Computer Science, vol. 10(4), pp. 835–855, 2000.
[20] Nelles O., Isermann R.: Basis function networks for interpolation of local linear models. Proceedings of the 35th IEEE Conference on Decision and Control, vol. 1, pp. 470–475, 1996.
[21] Pal N. R., Bezdek J. C.: On cluster validity for the fuzzy c-means model. Fuzzy Systems, IEEE Transactions on, vol. 3(3), pp. 370–379, 1995.
[22] Reichenbach H.: Wahrscheinlichkeitslogik. Erkenntnis, vol. 5, pp. 37–43, 1935. ISSN 0165-0106. URLhttp://dx.doi.org/10.1007/BF00172280.
[23] Rubin D.: Multiple Imputation For Nonresponse In Surveys. John Wiley & Sons, Inc., 1987.
[24] Sikora M., Krzystanek Z., Bojko B., Śpiechowicz K.: Application of a hybrid method of machine learning for description and on-line estimation of methane hazard in mine workings. Journal of Mining Sciences, vol. 47(4), pp. 493–505, 2011.
[25] Simiński K.: Neuro-fuzzy system with hierarchical domain partition. In: Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation (CIMCA 2008), pp. 392–397. IEEE Computer Society Publishing, Vienna, Austria, 2008. ISBN 978-0-7695-3514-2. URL http://dx.doi.org/10.1109/CIMCA.2008.67.
[26] Simiński K.: Patchwork neuro-fuzzy system with hierarchical domain partition. In: M. Kurzyński, M. Woźniak (eds.), Computer Recognition Systems 3, Advances in Intelligent and Soft Computing, vol. 57, pp. 11–18. Springer-Verlag, Berlin, Heidelberg, 2009. URL http://dx.doi.org/10.1007/978-3-540-93905-4_2.
[27] Simiński K.: Neuro-rough-fuzzy approach for regression modelling from missing data. International Journal of Applied Mathematics and Computer Science, vol. 22(2), pp. 461–476, 2012. URL http://dx.doi.org/DOI:10.2478/v10006-012-0035-4.
[28] Simiński K.: Clustering with missing values. Fundamenta Informaticae, vol. 123(3), pp. 331–350, 2013.
[29] Simiński K.: Rough fuzzy subspace clustering for data with missing values. Computing & Informatics, vol. 33(1), pp. 131–153, 2014.
[30] Simiński K.: Rough subspace neuro-fuzzy system. Fuzzy Sets and Systems, 2014. ISSN 0165-0114. URL http://dx.doi.org/http://dx.doi.org/10.1016/j.fss.2014.07.003.
[31] Timm H., D ̈oring C., Kruse R.: Different approaches to fuzzy clustering of incomplete datasets. International Journal of Approximate Reasoning, vol. 35(3), pp. 239–249, 2004. ISSN 0888-613X. URL http://dx.doi.org/DOI:10.1016/j.ijar.2003.08.004. Integration of Methods and Hybrid Systems.
[32] Timm H., Kruse R.: Fuzzy cluster analysis with missing values. NAFIPS 1998 Conference of the North American Fuzzy Information Processing Society, pp. 242–246. 1998. URL http://dx.doi.org/10.1109/NAFIPS.1998.715573.
[33] Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics, vol. 17(6), pp. 520–525, 2001. URL http://dx.doi.org/10.1093/bioinformatics/17.6.520.
[34] Wagstaff K. L., Laidler V. G.: Making the Most of Missing Values: Object Clustering with Partial Data in Astronomy. Proceedings of Astronomical Data Analysis Software and Systems XIV, vol. 347, pp. 172–176. Pasadena, California, USA, 2005.
[35] Xie X., Beni G.: A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13(8), pp. 841–847, 1991.
[36] Zhang C., Zhu X., Zhang J., Qin Y., Zhang S.: GBKII: An Imputation Method for Missing Values. Advances in Knowledge Discovery and Data Mining, vol. 4426, pp. 1080–1087, 2007.
[37] Zhang S.: Shell-neighbor method and its application in missing data imputation. In: Applied Intelligence, vol. 35(1), pp. 123–133, 2011. ISSN 0924-669X. URL http://dx.doi.org/10.1007/s10489-009-0207-6.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-9bc745fe-9d2d-4b81-8073-7dfd437cb3f7