PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

A new method for automatic determining of the DBSCAN parameters

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.
Słowa kluczowe
Rocznik
Strony
209--221
Opis fizyczny
Bibliogr. 37 poz., rys.
Twórcy
  • Department of Computer Engineering, Czestochowa University of Technology, al. Armii Krajowej 36, 42-200 Częstochowa, Poland
  • Information Technology Institute, University of Social Sciences, 90-113 Łódź and Clark University, Worcester, MA 01610, USA
autor
  • School of Marine Electrical Engineering Dalian Maritime University, China
Bibliografia
  • [1] Ankerst M., Breunig M, Kriegel H.P, Sandler J.: OPTICS: Ordering Points to Identify the Clustering Structure. Proceedings of the Int. Conf. on Management of Data, pp.49-60, (1999).
  • [2] Babu G.P., Murty M.N.: Simulated annealing for selecting optimal initial seeds in the k-means algorithm. Indian Journal of Pure and Applied Mathematics, Vol 25, pp.85-94 (1994).
  • [3] Bradley P., Fayyad U.: Refining initial points for k-means clustering. In Proceedings of the fifteenth international conference on knowledge discovery and data mining, New York, AAAI Press, pp. 9-15 (1998).
  • [4] Chen X., Liu W., Qui H, Lai J: APSCAN: A parameter free algorithm for clustering. Pattern Recognition Letters, Vol. 32, pp.973-986 (2011).
  • [5] Chen J.: Hybrid clustering algorithm based on pso with the multidimensional asynchronism and stochastic disturbance method. Journal of Theoretical and Applied Information Technology, Vol.46, pp.434-440 (2012).
  • [6] Chen Y., Tang S., Bouguila N., Wang C., Du J., Li H.: A Fast Clustering Algorithm based on pruning unnecessary distance computations in DBSCAN for High-Dimensional Data. Pattern Recognition Vol.83, pp.375-387 (2018)
  • [7] Darong H., Peng W.: Grid-based dbscan algorithm with referential parameters. Physics Procedia, Vol.24, Part B, pp.1166-1170 (2012).
  • [8] Ester M., Kriegel H.P, Sander J., Xu X.: A densitybased algorithm for discovering clusters in large spatial databases with noise. In Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining, pp.226-231 (1996).
  • [9] Fränti P., Rezaei M., Zhao Q.: Centroid index: Cluster level similarity measure. Pattern Recognition, Vol.47, Issue 9, pp.3034-3045 (2014).
  • [10] Gabryel M.: The Bag-of-Words Method with Different Types of Image Features and Dictionary Analysis. Journal of Universal Computer Science 24(4), pp.357-371 (2018).
  • [11] Gabryel M.: Data Analysis Algorithm for Click Fraud Recognition. Communications in Computer and Information Science, Vol.920, pp.437-446 (2018).
  • [12] Gabryel M., Damaševicius R., Przybyszewski K.: Application of the Bag-of-Words Algorithm in Classification the Quality of Sales Leads. Lecture Notes in Computer Science, Vol. 10841, pp.615-622 (2018).
  • [13] Hruschka E.R., de Castro L.N., Campello R.J.: Evolutionary algorithms for clustering geneexpression data, In: Data Mining, 2004. ICDM’04. Fourth IEEE International Conference on Data Mining, pp.403-406, IEEE (2004).
  • [14] Jain A.K., Murty M.N, Flynn P.J: Data Clustering: A Review. ACM Computing Surveys, Vol.31, No.3, pp.264-323 (1999).
  • [15] Karami A., Johansson R.: Choosing DBSCAN Parameters Automatically using Differential Evolution. International Journal of Computer Applications, Vol.91, pp.1-11 (2014).
  • [16] Lai W., Zhou M., Hu F., Bian K., Song Q.: A New DBSCAN Parameters Determination Method Based on Improved MVO. IEEE Access, Vol.7 (2019).
  • [17] Liu Z., Zhou D., Wu N.: Varied Density Based Spatial Clustering of Application with Noise. In proceedings of IEEE Conference ICSSSM, pp.528-531 (2007).
  • [18] Luchi D., Rodrigues A.L., Varejao F.M.: Sampling approaches for applying DBSCAN to large datasets. Pattern Recognition Letters, Vol.117, pp.90-96 (2019).
  • [19] Murtagh F.: A survey of recent advances in hierarchical clustering algorithms. Computer Journal, Vol.26, Issue 4, pp.354-359 (1983).
  • [20] Patrikainen A., Meila M.: Comparing Subspace Clusterings. IEEE Transactions on Knowledge and Data Engineering, Vol.18, Issue 7, pp.902-916 (2006).
  • [21] Pei Z., Xia Hua X., Han J.. The clustering algorithm based on particle swarm optimization algorithm. In Proceedings of the 2008 International Conference on Intelligent Computation Technology and Automation, Washington, USA. Vol.1, pp.148-151, (2008).
  • [22] Rohlf F.: Single-link clustering algorithms. In: P.R Krishnaiah and L.N. Kanal (Eds.), Handbook of Statistics, Vol.2, pp.267-284 (1982).
  • [23] Sameh A.S., Asoke K.N.: Development of assessment criteria for clustering algorithms. Pattern Analysis and Applications, Vol.12, Issue 1, pp.79-98 (2009).
  • [24] Serdah AM., Ashour WM.: Clustering Large-scale Data Based on Modified Affinity Propagation Algorithm. Journal of Artificial Intelligence and Soft Computing Research, Volume 6, Issue 1, pp.23-33, DOI:10.1515/jaiscr-2016-0003 (2016)
  • [25] Shah G.H.: An improved dbscan, a density based clustering algorithm with parameter selection for high dimensional data sets. In Nirma University International Engineering,(NUiCONE), pp.1-6 (2012).
  • [26] Sheikholeslam G., Chatterjee S., Zhang A.: WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. The International Journal on Very Large Data Bases, Vol.8 Issue 3-4, pp.289-304 (2000).
  • [27] Shieh H-L.: Robust validity index for a modified subtractive clustering algorithm. Applied Soft Computing, Vol.22, pp.47-59 (2014).
  • [28] Smiti A., Elouedi Z.: Dbscan-gm: An improved clustering method based on gaussian means and dbscan techniques. In 16th International Conference on Intelligent Engineering Systems (INES), pp. 573-578, (2012).
  • [29] Soni N., Ganatra A.: AGED (Automatic Generation of Eps for DBSCAN. Int. J. of ComputerScience and Information Security, Vol.14, No.5, pp.536-559, (2016).
  • [30] Starczewski A.: A new validity index for crisp clusters. Pattern Analysis and Applications, Vol.20, Issue 3, pp.687-700 (2017).
  • [31] Starczewski A., Krzyzak A.: A Modification of the ˙Silhouette Index for the Improvement of Cluster Validity Assessment. Lecture Notes in Computer Science, Vol.9693, pp.114-124 (2016).
  • [32] Tsekouras G.E: A simple and effective algorithm for implementing particle swarm optimization in rbf networks design using input-output fuzzy clustering. Neurocomputing, Vol.108, pp.36-44, (2013).
  • [33] Viswanath P., Suresh Babu V.S.: Rough-dbscan: A fast hybrid density based clustering method for large data sets. Pattern Recognition Letters, Vol.30 Issue 16, pp.1477-1488 (2009).
  • [34] Wang W., Yang J., Muntz R.: STING: A Statistical Information Grid Approach to Spatial Data Mining. VLDB ’97 Proceedings of the 23rd International Conference on Very Large Data Bases, pp.186-195 (1997).
  • [35] Xue-yong L., Guo-hong G., Jia-xia S.: A new intrusion detection method based on improved dbscan. In International Conference on Information Engineering (ICIE), Vol.2, pp.117-120 (2010).
  • [36] Zalik K.R.: An efficient k-means clustering algorithm. Pattern Recognition Letters, Vol.29, Issue 9, pp.1385-1391 (2008).
  • [37] Zhou H., Wang P., Li H.: Research on adaptive parameters determination in DBSCAN algorithm. J. of Information and Computational Science, Vol.9, No.7, pp.1967-1973 (2012).
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-a622ad05-ba90-46cb-bde1-f4ed15cf0b9f
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.