A new method for automatic determining of the DBSCAN parameters

Starczewski, Artur; Goetzen, Piotr; Er, Meng Joo

doi:10.2478/jaiscr-2020-0014

Artykuł - szczegóły

Tytuł artykułu

A new method for automatic determining of the DBSCAN parameters

Autorzy

Starczewski Artur , Goetzen Piotr , Er Meng Joo

Treść / Zawartość

Pełne teksty:

starczewski-A new method for automatic determining.pdf

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2020-0014

Warianty tytułu

Języki publikacji

Abstrakty

Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.

Słowa kluczowe

clustering algorithms DBSCAN data mining

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2020

Tom

Vol. 10, No. 3

Strony

209--221

Opis fizyczny

Bibliogr. 37 poz., rys.

Twórcy

autor

Starczewski Artur

artur.starczewski@pcz.pl

Department of Computer Engineering, Czestochowa University of Technology, al. Armii Krajowej 36, 42-200 Częstochowa, Poland

autor

Goetzen Piotr

Information Technology Institute, University of Social Sciences, 90-113 Łódź and Clark University, Worcester, MA 01610, USA

autor

Er Meng Joo

School of Marine Electrical Engineering Dalian Maritime University, China

Bibliografia

[1] Ankerst M., Breunig M, Kriegel H.P, Sandler J.: OPTICS: Ordering Points to Identify the Clustering Structure. Proceedings of the Int. Conf. on Management of Data, pp.49-60, (1999).
[2] Babu G.P., Murty M.N.: Simulated annealing for selecting optimal initial seeds in the k-means algorithm. Indian Journal of Pure and Applied Mathematics, Vol 25, pp.85-94 (1994).
[3] Bradley P., Fayyad U.: Refining initial points for k-means clustering. In Proceedings of the fifteenth international conference on knowledge discovery and data mining, New York, AAAI Press, pp. 9-15 (1998).
[4] Chen X., Liu W., Qui H, Lai J: APSCAN: A parameter free algorithm for clustering. Pattern Recognition Letters, Vol. 32, pp.973-986 (2011).
[5] Chen J.: Hybrid clustering algorithm based on pso with the multidimensional asynchronism and stochastic disturbance method. Journal of Theoretical and Applied Information Technology, Vol.46, pp.434-440 (2012).
[6] Chen Y., Tang S., Bouguila N., Wang C., Du J., Li H.: A Fast Clustering Algorithm based on pruning unnecessary distance computations in DBSCAN for High-Dimensional Data. Pattern Recognition Vol.83, pp.375-387 (2018)
[7] Darong H., Peng W.: Grid-based dbscan algorithm with referential parameters. Physics Procedia, Vol.24, Part B, pp.1166-1170 (2012).
[8] Ester M., Kriegel H.P, Sander J., Xu X.: A densitybased algorithm for discovering clusters in large spatial databases with noise. In Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining, pp.226-231 (1996).
[9] Fränti P., Rezaei M., Zhao Q.: Centroid index: Cluster level similarity measure. Pattern Recognition, Vol.47, Issue 9, pp.3034-3045 (2014).
[10] Gabryel M.: The Bag-of-Words Method with Different Types of Image Features and Dictionary Analysis. Journal of Universal Computer Science 24(4), pp.357-371 (2018).
[11] Gabryel M.: Data Analysis Algorithm for Click Fraud Recognition. Communications in Computer and Information Science, Vol.920, pp.437-446 (2018).
[12] Gabryel M., Damaševicius R., Przybyszewski K.: Application of the Bag-of-Words Algorithm in Classification the Quality of Sales Leads. Lecture Notes in Computer Science, Vol. 10841, pp.615-622 (2018).
[13] Hruschka E.R., de Castro L.N., Campello R.J.: Evolutionary algorithms for clustering geneexpression data, In: Data Mining, 2004. ICDM’04. Fourth IEEE International Conference on Data Mining, pp.403-406, IEEE (2004).
[14] Jain A.K., Murty M.N, Flynn P.J: Data Clustering: A Review. ACM Computing Surveys, Vol.31, No.3, pp.264-323 (1999).
[15] Karami A., Johansson R.: Choosing DBSCAN Parameters Automatically using Differential Evolution. International Journal of Computer Applications, Vol.91, pp.1-11 (2014).
[16] Lai W., Zhou M., Hu F., Bian K., Song Q.: A New DBSCAN Parameters Determination Method Based on Improved MVO. IEEE Access, Vol.7 (2019).
[17] Liu Z., Zhou D., Wu N.: Varied Density Based Spatial Clustering of Application with Noise. In proceedings of IEEE Conference ICSSSM, pp.528-531 (2007).
[18] Luchi D., Rodrigues A.L., Varejao F.M.: Sampling approaches for applying DBSCAN to large datasets. Pattern Recognition Letters, Vol.117, pp.90-96 (2019).
[19] Murtagh F.: A survey of recent advances in hierarchical clustering algorithms. Computer Journal, Vol.26, Issue 4, pp.354-359 (1983).
[20] Patrikainen A., Meila M.: Comparing Subspace Clusterings. IEEE Transactions on Knowledge and Data Engineering, Vol.18, Issue 7, pp.902-916 (2006).
[21] Pei Z., Xia Hua X., Han J.. The clustering algorithm based on particle swarm optimization algorithm. In Proceedings of the 2008 International Conference on Intelligent Computation Technology and Automation, Washington, USA. Vol.1, pp.148-151, (2008).
[22] Rohlf F.: Single-link clustering algorithms. In: P.R Krishnaiah and L.N. Kanal (Eds.), Handbook of Statistics, Vol.2, pp.267-284 (1982).
[23] Sameh A.S., Asoke K.N.: Development of assessment criteria for clustering algorithms. Pattern Analysis and Applications, Vol.12, Issue 1, pp.79-98 (2009).
[24] Serdah AM., Ashour WM.: Clustering Large-scale Data Based on Modified Affinity Propagation Algorithm. Journal of Artificial Intelligence and Soft Computing Research, Volume 6, Issue 1, pp.23-33, DOI:10.1515/jaiscr-2016-0003 (2016)
[25] Shah G.H.: An improved dbscan, a density based clustering algorithm with parameter selection for high dimensional data sets. In Nirma University International Engineering,(NUiCONE), pp.1-6 (2012).
[26] Sheikholeslam G., Chatterjee S., Zhang A.: WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. The International Journal on Very Large Data Bases, Vol.8 Issue 3-4, pp.289-304 (2000).
[27] Shieh H-L.: Robust validity index for a modified subtractive clustering algorithm. Applied Soft Computing, Vol.22, pp.47-59 (2014).
[28] Smiti A., Elouedi Z.: Dbscan-gm: An improved clustering method based on gaussian means and dbscan techniques. In 16th International Conference on Intelligent Engineering Systems (INES), pp. 573-578, (2012).
[29] Soni N., Ganatra A.: AGED (Automatic Generation of Eps for DBSCAN. Int. J. of ComputerScience and Information Security, Vol.14, No.5, pp.536-559, (2016).
[30] Starczewski A.: A new validity index for crisp clusters. Pattern Analysis and Applications, Vol.20, Issue 3, pp.687-700 (2017).
[31] Starczewski A., Krzyzak A.: A Modification of the ˙Silhouette Index for the Improvement of Cluster Validity Assessment. Lecture Notes in Computer Science, Vol.9693, pp.114-124 (2016).
[32] Tsekouras G.E: A simple and effective algorithm for implementing particle swarm optimization in rbf networks design using input-output fuzzy clustering. Neurocomputing, Vol.108, pp.36-44, (2013).
[33] Viswanath P., Suresh Babu V.S.: Rough-dbscan: A fast hybrid density based clustering method for large data sets. Pattern Recognition Letters, Vol.30 Issue 16, pp.1477-1488 (2009).
[34] Wang W., Yang J., Muntz R.: STING: A Statistical Information Grid Approach to Spatial Data Mining. VLDB ’97 Proceedings of the 23rd International Conference on Very Large Data Bases, pp.186-195 (1997).
[35] Xue-yong L., Guo-hong G., Jia-xia S.: A new intrusion detection method based on improved dbscan. In International Conference on Information Engineering (ICIE), Vol.2, pp.117-120 (2010).
[36] Zalik K.R.: An efficient k-means clustering algorithm. Pattern Recognition Letters, Vol.29, Issue 9, pp.1385-1391 (2008).
[37] Zhou H., Wang P., Li H.: Research on adaptive parameters determination in DBSCAN algorithm. J. of Information and Computational Science, Vol.9, No.7, pp.1967-1973 (2012).

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-a622ad05-ba90-46cb-bde1-f4ed15cf0b9f