Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets

Lu, K-C.; Yang, D-L.

Artykuł - szczegóły

Tytuł artykułu

Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets

Autorzy

Lu K-C. , Yang D-L.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Clustering is useful for mining the underlying structure of a dataset in order to support decision making since target or high-risk groups can be identified. However, for high dimensional datasets, the result of traditional clustering methods can be meaningless as clusters may only be depicted with respect to a small part of features. Taking customer datasets as an example, certain customers may correlate with their salary and education, and the others may correlate with their job and house location. If one uses all the features of a customer for clustering, these local-correlated clusters may not be revealed. In addition, processing high dimensions and large datasets is a challenging problem in decision making. Searching all the combinations of every feature with every record to extract local-correlated clusters is infeasible, which is in exponential scale in terms of data dimensionality and cardinality. In this paper, we propose a scalable 2-Leveled Approximated Hyper-Image-based Clustering framework, referred as 2L-HIC-A, for mining local-correlated clusters, where each level clustering process requires only one scan of the original dataset. Moreover, the data-processing time of 2L-HIC-A can be independent of the input data size. In 2L-HIC-A, various well-developed image processing techniques can be exploited for mining clusters. In stead of proposing a new clustering algorithm, our framework can accommodate other clustering methods for mining local-corrected clusters, and to shed new light on the existing clustering techniques.

Słowa kluczowe

local-correlated cluster approximated clustering high dimension large dataset image processing morphology

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2010

Tom

Vol. 98, nr 1

Strony

15--32

Opis fizyczny

Bibliogr. 31 poz., tab., wykr.

Twórcy

autor

Lu K-C.

autor

Yang D-L.

Department of Information Engineering and Computer Science, Feng Chia University, 100 Wen Hwa Road, Taichung, Taiwan, ROC, kjlu@selab.iecs.fcu.edu.tw

Bibliografia

[1] Aggarwal, C. C.: SA human-computer interactive method for projected clustering, IEEE Transactions on Knowledge and Data Engineering, 16(4), 2004, 448-460.
[2] Aggarwal, C. C., Yu, P. S.: Redefining clustering for high-dimensional applications, IEEE Transactions on Knowledge and Data Engineering, 14(2), 2002, 210-225.
[3] Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications, In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998, 94-105.
[4] Baraldi, A., Blonda, P.: A survey of fuzzy clustering algorithms for pattern recognition - part I, IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, 29(6), 1999, 778-785.
[5] Baraldi, A., Blonda, P.: A survey of fuzzy clustering algorithms for pattern recognition - Part II, IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics, 29(6), 1999, 786-801.
[6] Camastra, F., Verri, A.: A novel kernel method for clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 2005, 801-804.
[7] Chris, D., Xiaofeng, H., Hongyuan, Z., Horst, D. S.: Adaptive dimension reduction for clustering high dimensional data, In Proceedings of the 2002 IEEE International Conference on Data Mining, 2002, 147-154.
[8] Ester,M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise, In Proceedings of the 2nd International Conference on KDD, 1996, 226-231.
[9] Estivill-Castro, V., Lee, I.: Hierarchical clustering based on spatial proximity using delaunay diagram, In Proceedings of the 9th International Symposium on Spatial Data Handling, 2000, 7a.26-7a.41.
[10] Guha, S., Meyerson, A., Mishra, N., Motwani, R., O'Callaghan, L.:Clustering data streams: theory and practice, IEEE Transactions on Knowledge and Data Engineering, 15(3), 2003, 801-804.
[11] Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases, In Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998, 73-84.
[12] Hinneburg, A., Keim, D.A.: Optimal grid-clustering: towards breaking the curse of dimensionality in highdimensional clustering, In Proceedings of the 1999 Very Large Databases Conference, 1999, 506-517.
[13] Hoppner, F., Klawonn, F., Kruse, R., Runkler, T.:Clustering data streams: theory and practice, Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition, 1999.
[14] Hung, M. C., Yang, D. L.: An efficient fuzzy C-means clustering algorithm, In Proceedings of the IEEE International Conference on Data Mining , 2001, 225-232.
[15] Kollios, G., Gunopulos, D., Koudas, N., Berchtold, S.: Efficient biased sampling for approximate clustering and outlier detection in large data sets, IEEE Transactions on Knowledge and Data Engineering, 15(5), 2003, 1170-1187.
[16] Lance, P., Ehtesham, H., Huan, L.: Subspace clustering for high dimensional data: a review, ACM SIGKDD Explorations, 6(1), 2004, 90-105.
[17] Lee, J., Lee, D.: An improved cluster labeling method for support vector clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 2005, 461-464.
[18] Lee, J., Lee, D.: Dynamic characterization of cluster structures for robust and inductive support vector clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(11), 2006, 1869-1874.
[19] Lu, K. C., Yang, D. L.: Image Processing and Image Mining using Decision Trees, Journal of Information Science and Engineering, 25(4), 2009, 989-1003.
[20] Lu, K. C., Yang, D. L.: HIC: A Robust and EfficientHyper-Image-BasedClustering forVery LargeDatasets, Accepted by Journal of Information Science and Engineering, 2009.
[21] Lu, K. C.: HIC: Approximated Hyper-Image-Based Clustering for Breaking the Dimensionality Curse in High-Dimensional and Large Datasets, Submitted to Journal of Information Science and Engineering, 2009.
[22] Lu, K. C., Yang, D. L., Wu, J.: An Efficient Clustering Method for Very Large Datasets, WSEAS Transactions on Advances in Engineering Education, 3(2), 2005, 147-155.
[23] Mettu, R. R., Plaxton, C. G.: Optimal time bounds for approximate clustering, Machine Learning, 56(1-3), 2004, 35-60.
[24] Mihai, B., Sariel, H. P., Piotr, I.: Approximate clustering via core-sets, In Proceedings of the 34th Annual ACM Symposium on Theory of Computing, 2002, 250-257.
[25] Milligan, G. W., Cooper, M. C.: Methodology review: clustering methods, Applied Psychological Measurement, 11(4), 1987, 329-354.
[26] Ng, R. T., Han, J.: Efficient and effective clustering methods for spatial data mining, In Proceedings of the 20th VLDB Conference, 1994, 144-155.
[27] Nina, M., Dan, O., Leonard, P.: Sublinear time approximate clustering, In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms, 2001, 439-447.
[28] Schikuta, E.: Grid-clustering: an efficient hierarchical clustering method for very large data sets, In Proceedings of the 13th International Conference on Pattern Recognition, 1996, 101-105.
[29] Van Hulle, M. M.: Density-based clustering with topographic maps, IEEE Transactions on Neural Networks, 10(1), 1999, 204-207.
[30] Wang, W., Yang, J., Muntz, R.: STING: a statistical information grid approach to spatial data mining, In Proceedings of the 23rd International Conference on VLDB, 1997, 186-195.
[31] Yip, A. M., Ding, C., Chan, T. F.: Dynamic cluster formation using level set methods, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 2006, 877-889.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS8-0010-0002