Fast Clustering on CUDA Platform

Andrzejewski, W.; Kaźmierczak, T.

Artykuł - szczegóły

Tytuł artykułu

Fast Clustering on CUDA Platform

Autorzy

Andrzejewski W. , Kaźmierczak T.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

Języki publikacji

Abstrakty

K-Medoids clustering is very expensive. The basic algorithm PAM (Partitioning Around Medoids) does not scale very well for bigger datasets. To cope with this problem many modifications of the PAM algorithm, have been developed (ie. CLARANS). Unfortunately larger datasets still need to be clustered on computers whose computing power exceeds those of the normal desktop PCs. In this paper we present modifications of K-Medoids clustering algorithms PAM and CLARANS (Clustering Large Applications based on RANdomized Search) which utilize the graphics processing units (GPUs) of the modern graphics cards through CUDA (Compute Unified Device Architecture) platform to accelerate the most costly stages of these algorithms. We also present results of extensive performance experiments which show high improvements over old versions of these algorithms.

Słowa kluczowe

data mining clustering CUDA GPU

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2010

Tom

Vol. 35, No. 4

Strony

241--260

Opis fizyczny

Bibliogr. 34

Twórcy

autor

Andrzejewski W.

autor

Kaźmierczak T.

Poznan University of Technology, Piotrowo 2, 60-965 Poznań, Poland, witold.andrzejewski@cs.put.poznan.pl

Bibliografia

[1] B. J. Read, "Data Mining and Science?," in Proceedings of the 12th ERCIM Workshop on Database Research, 1999.
[2] N. Lavrač and B. Zupan, Data Mining and Knowledge Discovery Handbook, ch. Data Mining in Medicine, pp. 1107-1137. Springer US, 2005.
[3] B. Kovalerchuk and E. Vityaev, Data mining in finance: advances in relational and hybrid methods. Norwell, MA, USA: Kluwer Academic Publishers, 2000.
[4] C. Ling, , C. X. Ling, and C. Li, "Data Mining for Direct Marketing: Problems and Solutions," in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD '98, pp. 73-79, AAAI Press, 1998.
[5] S. Dolničar, "Using cluster analysis for market segmentation - typical misconceptions, established methodological weaknesses and some recommendations for improvement," Australasian Journal of Market Research, vol. 11, no. 2, pp. 5-12, 2003.
[6] C. Carpineto, S. Osiński, G. Romano, and D. Weiss, "A survey of web clustering engines," A CM Comput. Surv., vol. 41, no. 3, pp. 1-38, 2009.
[7] E. Ramaraj and M. Punithavalli, "Taxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA," Journal of Computer Science, vol. 2, no. 3, pp. 292-296, 2006.
[8] P. Berkhin, Grouping Multidimensional Data, ch. A Survey of Clustering Data Mining Techniques, pp. 25-71. Springer Berlin Heidelberg, 2006.
[9] H.-r. Fang and Y. Saad, "Farthest centroids divisive clustering," in ICMLA '08: Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications, (Washington, DC, USA), pp. 232-238, IEEE Computer Society, 2008.
[10] L. Kaufman and P. Rousseeuw, Finding Groups in Data An Introduction to Cluster Analysis. New York: Wiley Interscience, 1990.
[11] R. T. Ng and J. Han, "Clarans: A method for clustering objects for spatial data mining," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 5, pp. 1003-1016, 2002.
[12] S.-C. Chu, J. F. Roddick, and J. Pan, "A comparative study and extension to k-medoids algorithms," in 5th International Conference on Optimization: Techniques and Applications (ICOTA 2001), (Hong Kong), 2001.
[13] T. Kohonen, M. R. Schroeder, and T. S. Huang, eds., Self-Organizing Maps. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2001.
[14] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD '96), pp. 226-231, 1996.
[15] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, "Optics: ordering points to identify the clustering structure," in Proceedings of the 1999 ACM SIGMOD international conference on Management of data (SIGMOD '99), (New York, NY, USA), pp. 49-60, ACM, 1999.
[16] A. Hinneburg and H.-H. Gabriel, "Denclue 2.0: fast clustering based on kernel density estimation," in Proceedings of the 7th international conference on Intelligent data analysis (IDA'07), (Berlin, Heidelberg), pp. 70-80, Springer-Verlag, 2007.
[17] W. Wang, J. Yang, and R. R. Muntz, "Sting: A statistical information grid approach to spatial data mining," in Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB'97), (San Francisco, CA, USA), pp. 186-195, Morgan Kaufmann Publishers Inc., 1997.
[18] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic subspace clustering of high dimensional data for data mining applications," SIGMOD Record, vol. 27, no. 2, pp. 94-105, 1998.
[19] S. Guha, R. Rastogi, and K. Shim, "Rock: A robust clustering algorithm for categorical attributes," in Proceedings of the 15th International Conference on Data Engineering (ICDE '99), (Washington, DC, USA), p. 512, IEEE Computer Society, 1999.
[20] B. He, T. Tao, and K. C. chuan Chang, "Clustering structured web sources: a schema-based, model-differentiation approach," in ClustWeb Workshop (EDBT'04), pp. 536-546, 2004.
[21] V. Ganti, J. Gehrke, and R. Ramakrishnan, "Cactus-clustering categorical data using summaries," in Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '99), (New York, NY, USA), pp. 73-83, ACM, 1999.
[22] N. K. Govindaraju, J. Gray, R. Kumar, and D. Manocha, "Gputerasort: High performance graphics coprocessor sorting for large database management," in ACM SIGMOD International Conference on Management of Data, (Chicago, United States), June 2006
[23] A. Greβ and G. Zachmann, "Gpu-abisort: Optimal parallel sorting on stream architectures," in The 20th IEEE International Parallel and Distributed Processing Symposium, p. 45, April 2006.
[24] C. Sun, D. Agrawal, and A. E. Abbadi, "Hardware acceleration for spatial selections and joins," in Proceedings of the 2003 ACM SIGMOD international conference on Management of data (SIGMOD '03), (New York, NY, USA), pp. 455-466, ACM Press, 2003.
[25] N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha, "Fast computation of database operations using graphics processors," in Proceedings of the 2004 ACM SIGMOD international conference on Management of data (SIGMOD '04), (New York, NY, USA), pp. 215-226, ACM Press, 2004.
[26] S. Guha, S. Krishnan, and S. Venkatasubramanian, "Tutorial: Data visualization and mining using the gpu," 2005.
[27] F. Cao, A. Tung, and A. Zhou, "Scalable clustering using graphics processors.," in WAIM, pp. 372-384, 2006.
[28] C. Böhm, R. Noll, C. Plant, and B. Wackersreuther, "Density-based clustering using graphics processors," in CIKM '09: Proceeding of the 18th ACM conference on Information and knowledge management, (New York, NY, USA), pp. 661-670, ACM, 2009.
[29] C. Böhm, R. Noll, C. Plant, B. Wackersreuther, and A. Zherdin, "Data mining using graphics processing units," pp. 63-90, 2009.
[30] D.-J. Chang, M. M. Kantardzic, and M. Ouyang, "Hierarchical clustering with cuda/gpu.," in ISCA PDCCS (J. H. Graham and A. Skjellum, eds.), pp. 7-12, ISCA, 2009.
[31] W. Andrzejewski, "Fast K-Medoids Clustering on PCs," in Proceedings of the 3rd ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD '07), 2007.
[32] W. Andrzejewski and T. Kaźmierczak, "Wydajne grupowanie obiektów z wykorzystaniem technologii CUDA (polish)," in Proceedings of the 3rd National Conference on Data Processing Technologies (III Krajowa Konferencja Naukowa Technologie Przetwarzania Danych - KKNTPD 2010% pp. 466-483, WNT, 2010.
[33] M. Harris, "Optimizing parallel reduction in cuda." http://developer. download.nvidia.com/compute/cuda/l_l/Website/projects/reduction/ doc/reduction.pdf.
[34] NVIDIA CUDA Programming Guide, 2010.

Identyfikator YADDA

bwmeta1.element.baztech-article-BPP2-0019-0048