PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Exploiting multi-core and many-core parallelism for subspace clustering

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset, where a subspace is a subset of dimensions of the data. But the exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, which means that parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation shows linear speedup. Moreover, we develop an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.
Rocznik
Strony
81--91
Opis fizyczny
Bibliogr. 43 poz., rys., tab., wykr.
Twórcy
  • School of Computer Science and Software Engineering, University of Western Australia, 35 Stirling Highway, Crawley, Perth, WA 6009, Australia
  • School of Computer Science and Software Engineering, University of Western Australia, 35 Stirling Highway, Crawley, Perth, WA 6009, Australia
autor
  • Department of Electrical Engineering and Information Technology, Offenburg University of Applied Sciences, Badstr. 24, 77652 Offenburg, Germany
  • Department of Electrical Engineering and Information Technology, Offenburg University of Applied Sciences, Badstr. 24, 77652 Offenburg, Germany
Bibliografia
  • [1] Aggarwal, C.C. and Reddy, C.K. (2013). Data Clustering: Algorithms and Applications, 1st Edn., Chapman & Hall/CRC.
  • [2] Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C. and Park, J.S. (1999). Fast algorithms for projected clustering, SIGMOD Record 28(2): 61–72.
  • [3] Agrawal, R., Gehrke, J., Gunopulos, D. and Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications, ACM SIGMOD International Conference on Management of Data, Seattle, WA, USA, Vol. 27, pp. 94–105.
  • [4] Alcantara, D.A.F. (2011). Efficient Hash Tables on the GPU, PhD thesis, University of California Davis, Davis, CA.
  • [5] Anderson, S.E. (2018). Bit Twiddling Hacks–compute the lexicographically next bit permutation, http://graphics.stanford.edu/˜seander/bithacks.html#NextBitPermutation.
  • [6] Berkhin, P. (2006). A survey of clustering data mining techniques, in J. Kogan et al. (Eds.), Grouping Multidimensional Data, Springer, Berlin/Heidelberg, pp. 25–71.
  • [7] Cheng, C.-H., Fu, A.W. and Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data, 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp. 84–93.
  • [8] Dagum, L. and Menon, R. (1998). OpenMP: An industry standard API for shared-memory programming, IEEE Computational Science Engineering 5(1): 46–55.
  • [9] Datta, A., Kaur, A., Lauer, T. and Chabbouh, S. (2017). Parallel subspace clustering using multi-core and many-core architectures, in M. Kirikova et al. (Eds.), New Trends in Databases and Information Systems, Springer International Publishing, Cham, pp. 213–223.
  • [10] Elhamifar, E. and Vidal, R. (2013). Sparse subspace clustering: Algorithm, theory, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence 35(11): 2765–2781.
  • [11] Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise, International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, pp. 226–231.
  • [12] Fan, J., Han, F. and Liu, H. (2014). Challenges of big data analysis, National Science Review 1(2): 293–314.
  • [13] Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press, San Diego, CA.
  • [14] Geiger, A., Lenz, P., Stiller, C. and Urtasun, R. (2013). Vision meets robotics: The KITTI dataset, The International Journal of Robotics Research 32(11): 1231–1237.
  • [15] Google Scholar (2018). Search for ‘data clustering’, https://scholar.google.com/scholar?q=data+clustering&btnG=.
  • [16] Han, J., Kamber, M. and Pei, J. (2011). Data Mining: Concepts and Techniques, 3rd Edn., Morgan Kaufmann Publishers, San Francisco, CA.
  • [17] Harris, M., Sengupta, S. and Owens, J.D. (2007). Parallel prefix sum (scan) with CUDA, GPU Gems 3(39): 851–876.
  • [18] Jain, A.K. and Dubes, R.C. (1988). Algorithms for Clustering Data, Prentice-Hall, Inc., Upper Saddle River, NJ.
  • [19] Jain, A.K., Murty, M.N. and Flynn, P.J. (1999). Data clustering: A review, ACM Computing Surveys 31(3): 264–323.
  • [20] Joliffe, I.T. (2002). Principle Component Analysis, 2nd Edn., Springer, New York, NY.
  • [21] Jun, J., Chung, S. and McLeod, D. (2006). Subspace clustering of microarray data based on domain transformation, VLDB Workshop on Data Mining and Bioinformatics, Seoul, Korea, pp. 14–28.
  • [22] Kailing, K., Kriegel, H.-P. and Kröger, P. (2004). Density-connected subspace clustering for high-dimensional data, SIAM International Conference on Data Mining, Lake Buena Vista, FL, USA, Vol. 4, pp. 246–256.
  • [23] Kaur, A. and Datta, A. (2014). Subscale: Fast and scalable subspace clustering for high dimensional data, IEEE International Conference on Data Mining Workshop, Shenzhen, China, pp. 621–628.
  • [24] Kaur, A. and Datta, A. (2015). A novel algorithm for fast and scalable subspace clustering of high-dimensional data, Journal of Big Data 2(1): 1–24.
  • [25] Kriegel, H.-P., Kröger, P. and Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering, ACM Transactions on Knowledge Discovery from Data 3(1): 1–58.
  • [26] Li, T., Ma, S. and Ogihara, M. (2004). Document clustering via adaptive subspace iteration, 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, pp. 218–225.
  • [27] Lichman, M. (2013). UCI machine learning repository, http: //archive.ics.uci.edu/ml.
  • [28] Loughry, J., van Hemert, J. and Schoofs, L. (2000). Efficiently enumerating the subsets of a set, http://www.applied-math.org/subset.pdf.
  • [29] MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations, 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, Vol. 1, pp. 281–297.
  • [30] McCaffrey, J. (2004). Generating the MTH lexicographical element of a mathematical combination, MSDN Library, Microsoft, Redmond, WA.
  • [31] Murtagh, F. (1983). A survey of recent advances in hierarchical clustering algorithms, The Computer Journal 26(4): 354–359.
  • [32] Nagesh, H., Goil, S. and Choudhary, A. (2001). Adaptive grids for clustering massive data sets, 1st SIAM International Conference on Data Mining, Chicago, IL, USA, pp. 1–17.
  • [33] Nvidia CUDA (2018). CUDA parallel computing platform and programming model, http://www.nvidia.com/object/cuda_home_new.html.
  • [34] Parsons, L., Haque, E. and Liu, H. (2004). Subspace clustering for high dimensional data: A review, ACM SIGKDD Explorations Newsletter 6(1): 90–105.
  • [35] Sim, K., Gopalkrishnan, V., Zimek, A. and Cong, G. (2013). A survey on enhanced subspace clustering, Data Mining and Knowledge Discovery 26(2): 332–397.
  • [36] Steinbach, M., Ertöz, L. and Kumar, V. (2004). The challenges of clustering high dimensional data, in L.T. Wille (Ed.), New Directions in Statistical Physics, Springer, Berlin/Heidelberg, pp. 273–309.
  • [37] Strohm, P.T., Wittmer, S., Haberstroh, A. and Lauer, T. (2015). GPU-accelerated quantification filters for analytical queries in multidimensional databases, in N. Bassiliades et al. (Eds.), New Trends in Databases and Information Systems II, Springer, Cham, pp. 229–242.
  • [38] Thalamuthu, A., Mukhopadhyay, I., Zheng, X. and Tseng, G.C. (2006). Evaluation and comparison of gene clustering methods in microarray analysis, Bioinformatics 22(19): 2405–2412.
  • [39] Tierney, S., Gao, J. and Guo, Y. (2014). Subspace clustering for sequential data, IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, pp. 1019–1026.
  • [40] Xu, D. and Tian, Y. (2015). A comprehensive survey of clustering algorithms, Annals of Data Science 2(2): 165–193.
  • [41] Xu, R. and Wunsch, D. (2005). Survey of clustering algorithms, IEEE Transactions on Neural Networks 16(3): 645–678.
  • [42] Zhu, B., Mara, A. and Mozo, A. (2015). CLUS: Parallel subspace clustering algorithm on spark, in T. Morzy et al. (Eds.), New Trends in Databases and Information Systems, Communications in Computer and Information Science, Vol. 539, Springer International Publishing, Cham, pp. 175–185.
  • [43] Zhu, J., Liao, S., Lei, Z., Yi, D. and Li, S.Z. (2013). Pedestrian attribute classification in surveillance: Database and evaluation, ICCV Workshop on Large-Scale Video Search and Mining (LSVSM’13), Sydney, Australia, pp. 331–338.
Uwagi
PL
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-68d9047c-4971-41cc-9c95-8069b4807b04
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.