PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

A novel grid-based clustering algorithm

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Data clustering is an important method used to discover naturally occurring structures in datasets. One of the most popular approaches is the grid-based concept of clustering algorithms. This kind of method is characterized by a fast processing time and it can also discover clusters of arbitrary shapes in datasets. These properties allow these methods to be used in many different applications. Researchers have created many versions of the clustering method using the grid-based approach. However, the key issue is the right choice of the number of grid cells. This paper proposes a novel grid-based algorithm which uses a method for an automatic determining of the number of grid cells. This method is based on the kdist function which computes the distance between each element of a dataset and its kth nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.
Słowa kluczowe
Rocznik
Strony
319--330
Opis fizyczny
Bibliogr. 28 poz., rys.
Twórcy
  • Department of Intelligent Computer Systems, Czestochowa University of Technology al. Armii Krajowej 36, 42-200 Czestochowa, Poland
  • Faculty of Management, Czestochowa University of Technology, Poland
  • Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska 24, 31-155 Krak´ow, Poland
  • Management Department, University of Social Sciences, 90-113 Łódź, Poland
  • Clark University, Worcester, MA 01610, USA
autor
  • School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
Bibliografia
  • [1] Agrawal R., Gehrke J., Gunopulos D., Raghavan P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec., vol. 27, pp. 94-105 (1998).
  • [2] Boonchoo T., Ao X., Liu Y., Zhao W., He Q.: Grid-based DBSCAN: Indexing and inference. Pattern Recognition, Vol. 90, pp.271-284 (2019).
  • [3] Bradley P., Fayyad U.: Refining initial points for k-means clustering. In Proceedings of the fifteenth international conference on knowledge discovery and data mining, New York, AAAI Press, pp. 9-15 (1998).
  • [4] Chen Y., Tang S., Bouguila N., Wanga C., Du J., Li H.: A fast clustering algorithm based on pruning unnecessary distance computations in DB-SCAN for high-dimensional data. Pattern Recognition, Vol.83, pp.375-387 (2018).
  • [5] Darong H., Peng W.: Grid-based dbscan algorithm with referential parameters. Physics Procedia, 24, Part B, pp.1166-1170 (2012).
  • [6] Ester M., Kriegel H.P, Sander J., Xu X.: A density-based algorithm for discovering clusters in large spatial databases with noise, In Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226-231 (1996).
  • [7] Fränti P., Rezaei M., Zhao Q.: Centroid index: Cluster level similarity measure. Pattern Recognition, Vol. 47, Issue 9, pp. 3034-3045 (2014).
  • [8] Gabryel M.: Data Analysis Algorithm for Click Fraud Recognition. Communications in Computer and Information Science, Vol 920, pp.437-446 (2018).
  • [9] Gan J., Tao Y.: Dbscan revisited: mis-claim, unfixability, and approximation. SIGMOD (2015).
  • [10] Grycuk R., Najgebauer P., Kordos M., Scherer M., Marchlewska A.: Fast Image Index for Database Management Engines. Journal of Artificial Intelligence and Soft Computing Research, Vol. 10, Issue 2, pp.113 - 123 (2020)
  • [11] Hruschka E.R., de Castro L.N., Campello R.J.: Evolutionary algorithms for clustering gene-expression data, In: Data Mining, 2004. ICDM’04. Fourth IEEE International Conference on. pp. 403-406, IEEE (2004).
  • [12] Karami A., Johansson R.: Choosing DBSCAN Parameters Automatically using Differential Evolution. International Journal of Computer Applications, Vol. 91, pp.1-11 (2014)
  • [13] Kumar K.M., Reddy A.R.M.: A fast DBSCAN clustering algorithm by accelerating neighbor searching using groups method. Pattern Recognition, vol 58, pp.39-48 (2016).
  • [14] Liu F., Wen P. and Zhu E.: Efficient Grid-based Clustering Algorithm with Leaping Search and Merge Neighbors Method. IOP Conf. Series: Materials Science and Engineering, vol. 242 (2017)
  • [15] Luchi D., Rodrigues A.L., Varejao F.M.: Sampling approaches for applying DBSCAN to large datasets. Pattern Recognition Letters, Vol.117, pp.90-96 (2019).
  • [16] Meng X., van Dyk D.: The EM algorithm - An old folk-song sung to a fast new tune. Journal of the Royal Statistical Society, Series B (Methodological) Vol. 59, Issue 3, pp. 511-567 (1997).
  • [17] Murtagh F.: A survey of recent advances in hierarchical clustering algorithms. Computer Journal, Vol. 26, Issue 4, pp. 354-359 (1983).
  • [18] Patrikainen A., Meila M.: Comparing Subspace Clusterings, IEEE Transactions on Knowledge and Data Engineering, Vol.18, Issue 7, pp.902-916 (2006).
  • [19] Rohlf F.: Single-link clustering algorithms. In: P.R Krishnaiah and L.N. Kanal (Eds.), Handbook of Statistics, Vol. 2, pp. 267-284 (1982).
  • [20] Sameh A.S., Asoke K.N.: Development of assessment criteria for clustering algorithms. Pattern Analysis and Applications, Vol. 12, Issue 1, pp. 79-98 (2009).
  • [21] Shah G.H.: An improved dbscan, a density based clustering algorithm with parameter selection for high dimensional data sets. In Nirma University International Engineering,(NUiCONE) pp. 1-6 (2012).
  • [22] Sheikholeslam G., Chatterjee S., Zhang A.: WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. The International Journal on Very Large Data Bases, Vol.8 Issue 3-4, pp.289-304 (2000).
  • [23] Shieh H-L.: Robust validity index for a modified subtractive clustering algorithm. Applied Soft Computing, Vol. 22, pp. 47-59 (2014).
  • [24] Starczewski A.: A new validity index for crisp clusters. Pattern Analysis and Applications, Vol.20, Issue 3, pp. 687-700 (2017).
  • [25] Starczewski A., Cader A.: Determining the Eps Parameter of the DBSCAN Algorithm Lecture Notes in Computer Science, Vol. 11509, pp. 420-430 (2019).
  • [26] Wang W., Yang J., Muntz R.: STING: A Statistical Information Grid Approach to Spatial Data Mining. VLDB ’97 Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 186-195 (1997).
  • [27] Viswanath P., Suresh Babu V.S.: Rough-dbscan: A fast hybrid density based clustering method for large data sets. Pattern Recognition Letters, Vol. 30 Issue 16, pp.1477-1488 (2009).
  • [28] Zalik K.R.: An efficient k-means clustering algorithm. Pattern Recognition Letters, Vol.29, Issue 9, pp.1385-1391 (2008).
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-a00bada7-057d-470c-b466-8cc795d175ac
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.