A novel grid-based clustering algorithm

Starczewski, Artur; Scherer, Magdalena M.; Książek, Wojciech; Dębski, Maciej; Wang, Lipo

doi:10.2478/jaiscr-2021-0019

Artykuł - szczegóły

Tytuł artykułu

A novel grid-based clustering algorithm

Autorzy

Starczewski Artur , Scherer Magdalena M. , Książek Wojciech , Dębski Maciej , Wang Lipo

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2021-0019

Warianty tytułu

Języki publikacji

Abstrakty

Data clustering is an important method used to discover naturally occurring structures in datasets. One of the most popular approaches is the grid-based concept of clustering algorithms. This kind of method is characterized by a fast processing time and it can also discover clusters of arbitrary shapes in datasets. These properties allow these methods to be used in many different applications. Researchers have created many versions of the clustering method using the grid-based approach. However, the key issue is the right choice of the number of grid cells. This paper proposes a novel grid-based algorithm which uses a method for an automatic determining of the number of grid cells. This method is based on the kdist function which computes the distance between each element of a dataset and its kth nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.

Słowa kluczowe

data mining grid-based clustering grid structure

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2021

Tom

Vol. 11, No. 4

Strony

319--330

Opis fizyczny

Bibliogr. 28 poz., rys.

Twórcy

autor

Starczewski Artur

artur.starczewski@pcz.pl

Department of Intelligent Computer Systems, Czestochowa University of Technology al. Armii Krajowej 36, 42-200 Czestochowa, Poland

autor

Scherer Magdalena M.

Faculty of Management, Czestochowa University of Technology, Poland

autor

Książek Wojciech

Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska 24, 31-155 Krak´ow, Poland

autor

Dębski Maciej

Management Department, University of Social Sciences, 90-113 Łódź, Poland
Clark University, Worcester, MA 01610, USA

autor

Wang Lipo

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

Bibliografia

[1] Agrawal R., Gehrke J., Gunopulos D., Raghavan P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec., vol. 27, pp. 94-105 (1998).
[2] Boonchoo T., Ao X., Liu Y., Zhao W., He Q.: Grid-based DBSCAN: Indexing and inference. Pattern Recognition, Vol. 90, pp.271-284 (2019).
[3] Bradley P., Fayyad U.: Refining initial points for k-means clustering. In Proceedings of the fifteenth international conference on knowledge discovery and data mining, New York, AAAI Press, pp. 9-15 (1998).
[4] Chen Y., Tang S., Bouguila N., Wanga C., Du J., Li H.: A fast clustering algorithm based on pruning unnecessary distance computations in DB-SCAN for high-dimensional data. Pattern Recognition, Vol.83, pp.375-387 (2018).
[5] Darong H., Peng W.: Grid-based dbscan algorithm with referential parameters. Physics Procedia, 24, Part B, pp.1166-1170 (2012).
[6] Ester M., Kriegel H.P, Sander J., Xu X.: A density-based algorithm for discovering clusters in large spatial databases with noise, In Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226-231 (1996).
[7] Fränti P., Rezaei M., Zhao Q.: Centroid index: Cluster level similarity measure. Pattern Recognition, Vol. 47, Issue 9, pp. 3034-3045 (2014).
[8] Gabryel M.: Data Analysis Algorithm for Click Fraud Recognition. Communications in Computer and Information Science, Vol 920, pp.437-446 (2018).
[9] Gan J., Tao Y.: Dbscan revisited: mis-claim, unfixability, and approximation. SIGMOD (2015).
[10] Grycuk R., Najgebauer P., Kordos M., Scherer M., Marchlewska A.: Fast Image Index for Database Management Engines. Journal of Artificial Intelligence and Soft Computing Research, Vol. 10, Issue 2, pp.113 - 123 (2020)
[11] Hruschka E.R., de Castro L.N., Campello R.J.: Evolutionary algorithms for clustering gene-expression data, In: Data Mining, 2004. ICDM’04. Fourth IEEE International Conference on. pp. 403-406, IEEE (2004).
[12] Karami A., Johansson R.: Choosing DBSCAN Parameters Automatically using Differential Evolution. International Journal of Computer Applications, Vol. 91, pp.1-11 (2014)
[13] Kumar K.M., Reddy A.R.M.: A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognition, vol 58, pp.39-48 (2016).
[14] Liu F., Wen P. and Zhu E.: Efficient Grid-based Clustering Algorithm with Leaping Search and Merge Neighbors Method. IOP Conf. Series: Materials Science and Engineering, vol. 242 (2017)
[15] Luchi D., Rodrigues A.L., Varejao F.M.: Sampling approaches for applying DBSCAN to large datasets. Pattern Recognition Letters, Vol.117, pp.90-96 (2019).
[16] Meng X., van Dyk D.: The EM algorithm - An old folk-song sung to a fast new tune. Journal of the Royal Statistical Society, Series B (Methodological) Vol. 59, Issue 3, pp. 511-567 (1997).
[17] Murtagh F.: A survey of recent advances in hierarchical clustering algorithms. Computer Journal, Vol. 26, Issue 4, pp. 354-359 (1983).
[18] Patrikainen A., Meila M.: Comparing Subspace Clusterings, IEEE Transactions on Knowledge and Data Engineering, Vol.18, Issue 7, pp.902-916 (2006).
[19] Rohlf F.: Single-link clustering algorithms. In: P.R Krishnaiah and L.N. Kanal (Eds.), Handbook of Statistics, Vol. 2, pp. 267-284 (1982).
[20] Sameh A.S., Asoke K.N.: Development of assessment criteria for clustering algorithms. Pattern Analysis and Applications, Vol. 12, Issue 1, pp. 79-98 (2009).
[21] Shah G.H.: An improved dbscan, a density based clustering algorithm with parameter selection for high dimensional data sets. In Nirma University International Engineering,(NUiCONE) pp. 1-6 (2012).
[22] Sheikholeslam G., Chatterjee S., Zhang A.: WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. The International Journal on Very Large Data Bases, Vol.8 Issue 3-4, pp.289-304 (2000).
[23] Shieh H-L.: Robust validity index for a modified subtractive clustering algorithm. Applied Soft Computing, Vol. 22, pp. 47-59 (2014).
[24] Starczewski A.: A new validity index for crisp clusters. Pattern Analysis and Applications, Vol.20, Issue 3, pp. 687-700 (2017).
[25] Starczewski A., Cader A.: Determining the Eps Parameter of the DBSCAN Algorithm Lecture Notes in Computer Science, Vol. 11509, pp. 420-430 (2019).
[26] Wang W., Yang J., Muntz R.: STING: A Statistical Information Grid Approach to Spatial Data Mining. VLDB ’97 Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 186-195 (1997).
[27] Viswanath P., Suresh Babu V.S.: Rough-dbscan: A fast hybrid density based clustering method for large data sets. Pattern Recognition Letters, Vol. 30 Issue 16, pp.1477-1488 (2009).
[28] Zalik K.R.: An efficient k-means clustering algorithm. Pattern Recognition Letters, Vol.29, Issue 9, pp.1385-1391 (2008).

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-a00bada7-057d-470c-b466-8cc795d175ac