PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

GrDBSCAN: A granular density-based clustering algorithm

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Density-based spatial clustering of applications with noise (DBSCAN) is a commonly known and used algorithm for data clustering. It applies a density-based approach and can produce clusters of any shape. However, it has a drawback-its worst-case computational complexity is O(n2) with regard to the number of data items n. The paper presents GrDBSCAN: a granular modification of DBSCAN with reduced complexity. The proposed GrDBSCAN first granulates data into fuzzy granules and then runs density-based clustering on the resulting granules. The complexity of GrDBSCAN is linear with regard to the input data size and higher only for the number of granules. That number is, however, a parameter of the GrDBSCAN algorithm and is (significantly) lower than that of input data items. This results in shorter clustering time than in the case of DBSCAN. The paper is accompanied by numerical experiments. The implementation of GrDBSCAN is freely available from a public repository.
Rocznik
Strony
297--312
Opis fizyczny
Bibliogr. 67 poz., rys., tab., wykr.
Twórcy
autor
  • Department of Algorithmics and Software Silesian University of Technology ul. Akademicka 16, 44-100 Gliwice, Poland
  • Department of Algorithmics and Software Silesian University of Technology ul. Akademicka 16, 44-100 Gliwice, Poland
Bibliografia
  • [1] Agrawal, R., Gehrke, J., Gunopulos, D. and Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications, ACM SIGMOD Record 27(2): 94-105.
  • [2] Ankerst, M., Breunig, M., Kriegel, H.-P. and Sander, J. (1999). Optics: Ordering points to identify the clustering structure, ACM SIGMOD’99: International Conference on Management of Data, Philadelphia, USA, pp. 49-60.
  • [3] Bargiela, A. and Pedrycz, W. (2006). The roots of granular computing, 2006 IEEE International Conference on Granular Computing, Atlanta, USA, pp. 806-809.
  • [4] Chakraborty, C. and Chakraborty, D. (2006). A theoretical development on a fuzzy distance measure for fuzzy numbers, Mathematical and Computer Modelling 43(3): 254-261.
  • [5] Cheng, C.-H., Fu, A.W. and Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data, KDD’99: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, USA, pp. 84-93.
  • [6] Chicco, D. and Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC Genomics 21(6): 1-13.
  • [7] Cho, E., Myers, S.A. and Leskovec, J. (2011). Friendship and mobility: User movement in location-based social networks, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’11, San Diego, USA, pp. 1082-1090.
  • [8] Ciucci, D. (2016). Orthopairs and granular computing, Granular Computing 1: 159-170.
  • [9] Defays, D. (1977). An efficient algorithm for a complete link method, The Computer Journal 20(4): 364-366.
  • [10] Diamond, P. and Körner, R. (1997). Extended fuzzy linear models and least squares estimates, Computers & Mathematics with Applications 33(9): 15-32.
  • [11] Dunn, J.C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact, well separated clusters, Journal Cybernetics 3(3): 32-57.
  • [12] Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, USA, pp. 226-231.
  • [13] Gan, J. and Tao, Y. (2015). DBSCAN revisited: Mis-claim, un-fixability, and approximation, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD’15, Melbourne, Australia, pp. 519-530.
  • [14] Guha, S., Rastogi, R. and Shim, K. (2001). CURE: An efficient clustering algorithm for large databases, Information Systems 26(1): 35-58.
  • [15] Gunawan, A. (2013). A Faster Algorithm for DBSCAN, Master’s thesis, Eindhoven University of Technology, Eindhoven.
  • [16] Gustafson, D.E. and Kessel, W.C. (1978). Fuzzy clustering with a fuzzy covariance matrix, 1978 IEEE Conference on Decision and Control Including the 17th Symposium on Adaptive Processes, San Diego, USA, pp. 761-766.
  • [17] Hartuv, E. and Shamir, R. (2000). A clustering algorithm based on graph connectivity, Information Processing Letters 76(4): 175-181.
  • [18] Hinneburg, A. and Keim, D.A. (1998). An efficient approach to clustering in large multimedia databases with noise, Proceedings of the 4th International Conference on Knowledge Discovery and Datamining (KDD’98), New York, USA, pp. 58-65.
  • [19] Jajuga, K. (1991). L1-norm based fuzzy clustering, Fuzzy Sets and Systems 39(1): 43-50.
  • [20] Junlin, L. and Hongguang, F. (2011). Molecular dynamics-like data clustering approach, Pattern Recognition 44(8): 1721-1737.
  • [21] Karami, A. and Johansson, R. (2014). Choosing DBSCAN parameters automatically using differential evolution, International Journal of Computer Applications 91(7): 1-11.
  • [22] Karypis, G., Han, E.-H. and Kumar, V. (1999). Chameleon: A hierarchical clustering algorithm using dynamic modeling, IEEE Computer 32: 68-75.
  • [23] Keet, C.M. (2008). A Formal Theory of Granularity, PhD thesis, Free University of Bozen-Bolzano, Bolzano.
  • [24] Krishnapuram, R. and Keller, J. (1993). A possibilistic approach to clustering, IEEE Transactions on Fuzzy Systems 1(2): 98-110.
  • [25] Leski, J.M. (2016). Fuzzy c-ordered-means clustering, Fuzzy Sets and Systems 286: 114-133.
  • [26] Luo, M. and Cheng, Z. (2015). The distance between fuzzy sets in fuzzy metric spaces, 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD’15), Zhangjiajie, China, pp. 197-201.
  • [27] Matthews, B. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochimica et Biophysica Acta (BBA)-Protein Structure 405(2): 442-451.
  • [28] Michalak, M. and Stawarz, M. (2013). HRoBi-The algorithm for hierarchical rough bi-clustering, in L. Rutkowski et al. (Eds), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7895, Springer, Berlin/Heidelberg, pp. 194-205.
  • [29] Pedrycz, A., Hirota, K., Pedrycz, W. and Dong, F. (2012). Granular representation and granular computing with fuzzy sets, Fuzzy Sets and Systems 203: 17-32.
  • [30] Pedrycz, W. (2013). Granular Computing: Analysis and Design of Intelligent Systems, CRC Press, Boca Raton.
  • [31] Pedrycz, W., Succi, G., Sillitti, A. and Iljazi, J. (2015). Data description: A general framework of information granules, Knowledge-Based Systems 80: 98-108.
  • [32] Qian, Y.H., Liang, J.Y., Yao, Y.Y. and Dang, C.Y. (2010). MGRS: A multi-granulation rough set, Information Sciences 180(6): 949-970.
  • [33] Qian, Y., Liang, J., Zhi Z. Wu, W. and Dang, C. (2011). Information granularity in fuzzy binary GRC model, IEEE Transactions on Fuzzy Systems 19(2): 253-264.
  • [34] Salehi, S., Selamat, A. and Fujita, H. (2015). Systematic mapping study on granular computing, Knowledge-Based Systems 80: 78-97.
  • [35] Sander, J., Ester, M., Kriegel, H.-P. and Xu, X. (1998). Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications, Data Mining and Knowledge Discovery 2: 169-194.
  • [36] Schubert, E., Sander, J., Ester, M., Kriegel, H.P. and Xu, X. (2017). DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN, ACM Transactions on Database Systems 42(3): 1-21.
  • [37] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8): 888-905.
  • [38] Shifei, D., Li, X., Hong, Z. and Liwen, Z. (2010). Research and progress of cluster algorithms based on granular computing, International Journal of Digital Content Technology and its Applications 4(5): 96-104.
  • [39] Sibson, R. (1973). SLINK: An optimally efficient algorithm for the single-link cluster method, The Computer Journal 16(1): 30-34.
  • [40] Siminski, K. (2014). Rough fuzzy subspace clustering for data with missing values, Computing & Informatics 33(1): 131-153.
  • [41] Siminski, K. (2017). Fuzzy weighted c-ordered means clustering algorithm, Fuzzy Sets and Systems 318: 1-33.
  • [42] Siminski, K. (2019). NFL-Free library for fuzzy and neuro-fuzzy systems, in S. Kozielski et al. (Eds), Beyond Databases, Architectures and Structures. Paving the Road to Smart Data Processing and Analysis, Springer International Publishing, Cham, pp. 139-150.
  • [43] Siminski, K. (2020). GrFCM-Granular clustering of granular data, in A. Gruca et al. (Eds), Man-Machine Interactions 6, Springer International Publishing, Cham, pp. 111-121.
  • [44] Siminski, K. (2021a). GrNFS: A granular neuro-fuzzy system for regression in large volume data, International Journal of Applied Mathematics and Computer Science 31(3): 445-459, DOI: 10.34768/amcs-2021-0030.
  • [45] Siminski, K. (2021b). An outlier-robust neuro-fuzzy system for classification and regression, International Journal of Applied Mathematics and Computer Science 31(2): 303-319, DOI: 10.34768/amcs-2021-0021.
  • [46] Siminski, K. (2023). 3WDNFS-Three-way decision neuro-fuzzy system for classification, Fuzzy Sets and Systems 466, Article ID: 108432, DOI: 10.1016/j.fss.2022.10.021.
  • [47] Siminski, K. (2022a). FuBiNFS-Fuzzy biclustering neuro-fuzzy system, Fuzzy Sets and Systems 438: 84-106.
  • [48] Siminski, K. (2022b). Prototype based granular neuro-fuzzy system for regression task, Fuzzy Sets and Systems 449: 56-78.
  • [49] Skowron, A., Jankowski, A. and Dutta, S. (2016). Interactive granular computing, Granular Computing 1: 95-113.
  • [50] Starczewski, A., Goetzen, P. and Er, M.J. (2020). A new method for automatic determining of the DBSCAN parameters, Journal of Artificial Intelligence and Soft Computing Research 10(3): 209-221.
  • [51] Stawarz, M. and Michalak, M. (2012). eBi-The algorithm for exact biclustering, in L. Rutkowski et al. (Eds), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7268, Springer, Berlin/Heidelberg, pp. 327-334.
  • [52] Wang, W., Yang, J. and Muntz, R. (1997). STING: A statistical information grid approach to spatial data mining, Proceedings of the 23rd International Conference on Very Large Data Bases, San Francisco, USA, pp. 186-195.
  • [53] Wright, W. (1977). Gravitational clustering, Pattern Recognition 9(3): 151-166.
  • [54] Wu, C., Peng, Q., Lee, J., Leibnitz, K. and Xia, Y. (2021). Effective hierarchical clustering based on structural similarities in nearest neighbor graphs, Knowledge-Based Systems 228: 107295.
  • [55] Xia, S., Chen, L., Liu, S. and Yang, H. (2022). A new method for decision making problems with redundant and incomplete information based on incomplete soft sets: From crisp to fuzzy, International Journal of Applied Mathematics and Computer Science 32(4): 657-669, DOI: 10.34768/amcs-2022-0045.
  • [56] Yao, J.T., Vasilakos, A.V. and Pedrycz, W. (2013). Granular computing: Perspectives and challenges, IEEE Transactions on Cybernetics 43(6): 1977-1989.
  • [57] Yao, Y. (2007). The art of granular computing, in M. Kryszkiewicz et al. (Eds), Rough Sets and Intelligent Systems Paradigms, Springer, Berlin/Heidelberg, pp. 101-112.
  • [58] Yao, Y. (2008). Granular computing: Past, present and future, 2008 IEEE International Conference on Granular Computing, GrC 2008, Hangzhou, China, pp. 80-85.
  • [59] Yao, Y. (2009). Three-way decision: An interpretation of rules in rough set theory, in P. Wen et al. (Eds), Rough Sets and Knowledge Technology, Springer, Berlin/Heidelberg, pp. 642-649.
  • [60] Yao, Y. (2011). The superiority of three-way decisions in probabilistic rough set models, Information Sciences 181(6): 1080-1096.
  • [61] Yao, Y. (2016). A triarchic theory of granular computing, Granular Computing 1: 145-157.
  • [62] Yao, Y. (2018). Three-way decision and granular computing, International Journal of Approximate Reasoning 103: 107-123.
  • [63] Yao, Y. (2020). Three-way granular computing, rough sets, and formal concept analysis, International Journal of Approximate Reasoning 116: 106-125.
  • [64] Yao, Y. and Zhong, N. (2007). Granular computing, in B. Wah (Ed.), Wiley Encyclopedia of Computer Science and Engineering, Wiley, Hoboken, DOI: 101002/9780470050118.essse468.
  • [65] Zadeh, L.A. (1979). Fuzzy sets and information granularity, in N. Gupta et al. (Eds), Advances in Fuzzy Set Theory and Applications, North-Holland Publishing Co., Amsterdam, pp. 3-18.
  • [66] Zadeh, L.A. (1997). Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems 90(2): 111-127.
  • [67] Zadeh, L.A. (2002). From computing with numbers to computing with words-From manipulation of measurements to manipulation of perceptions, International Journal of Applied Mathematics and Computer Science 12(3): 307-324.
Uwagi
PL
Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023)
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-48b4d77b-bc38-44fb-95f1-4549019aeabb
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.