PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Eigenvalue-based incremental spectral clustering

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Our previous experiments demonstrated that subsets of collections of (short) documents (with several hundred entries) share a common, normalized in some way, eigenvalue spectrum of combinatorial Laplacian. Based on this insight, we propose a method of incremental spectral clustering. The method consists of the following steps: (1) split the data into manageable subsets, (2) cluster each of the subsets, (3) merge clusters from different subsets based on the eigenvalue spectrum similarity to form clusters of the entire set. This method can be especially useful for clustering methods of complexity strongly increasing with the size of the data sample, like in case of typical spectral clustering. Experiments were performed showing that in fact the clustering and merging of subsets yield clusters close to clustering of the entire dataset. Our approach differs from other research streams in that we rely on the entire set (spectrum) of eigenvalues, whereas the other researchers concentrate on few eigenvectors related to lowest eigenvalues. Such eigenvectors are considered in the literature as of low reliability.
Rocznik
Strony
157--169
Opis fizyczny
Bibliogr. 33 poz., rys.
Twórcy
  • Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland
  • Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland
  • Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland
Bibliografia
  • [1] J. Tu, G. Mei, and F. Picciallib, An improved Nyström spectral graph clustering using k-core decomposition as a sampling strategy for large networks, Journal of King Saud University - Computer and Information Sciences, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1319157822001379
  • [2] H. Sevi, M. Jonckheere, and A. Kalogeratos, Generalized spectral clustering for directed and undirected graphs, 2022, https://arxiv.org/abs/2203.03221.
  • [3] S. Wierzchoń and M. Kłopotek, Modern Clustering Algorithms, ser. Studies in Big Data. Springer Verlag, 2018, vol. 34.
  • [4] R. Janani and S. Vijayarani, Text document clustering using spectral clustering algorithm with particle swarm optimization, Expert Systems with Applications, vol. 134, pp. 192–200, 2019.
  • [5] P. Shrestha, C. Jacquin, and B. Daille, Clustering short text and its evaluation, in Computational Linguistics and Intelligent Text Processing. CICLing 2012, ser. Lecture Notes in Computer Science, A. Gelbukh, Ed., vol. 7182. Springer, Berlin, Heidelberg, 2012, p. 169–180, https://doi.org/10.1007/978-3-642-28601-8_15.
  • [6] F. Nie, Z. Zeng, I. Tsang, D. Xu, and C. Chang-shui, Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering, IEEE Trans. Neur. Netw., 2011.
  • [7] Y. Bengio, J. Paiement, P. Vincent, O. Delalleau, N. L. Roux, and M. Ouimet, Out-of-sample extensions for LLE, ISOMAP, MDS, eigenmaps, and spectral clustering, in NIPS 16, 2003, pp. 177–184.
  • [8] C. Alzate and J. Suykens, Multiway spectral clustering with out-of-sample extensions through weighted kernel pca, IEEE Tran. on Pattern Analysis and Machine Intelligence, vol. 32, no. 2, pp. 335 – 347, 2010.
  • [9] D. Shen, X. Li, and G. Yan, Improve the spectral clustering by integrating a new modularity similarity index and out-of-sample extension, Modern Physics Letters B, vol. 34, no. 11, p. 2050105, 2020.
  • [10] P. Borkowski, M. Kłopotek, B. Starosta, S. Wierzchoń, and M. Sydow, Eigen-value based spectral classification, PLOS ONE, vol. 18, no. 4, p. e0283413, 2023, https://doi.org/10.1371/journal.pone.0283413.
  • [11] J. Chen, Convergence rate of krasulina estimator, Statistics & Probability Letters, vol. 155, p. 108562, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167715219302081.
  • [12] B. Mandal, X. Jiang, H.-L. Eng, and A. Kot, Prediction of eigenvalues and regularization of eigenfeatures for human face verification, Pattern Recognition Letters, vol. 31, no. 8, pp. 717–724, 2010, award winning papers from the 19th International Conference on Pattern Recognition (ICPR).
  • [13] S. T. Wierzchon and M. A. Klopotek, Spectral cluster maps versus spectral clustering, in Computer Information Systems and Industrial Management, ser. LNCS, vol. 12133. Springer, 2020, pp. 472–484. [Online]. Available: https://doi.org/10.1007/978-3-030-47679-3_40.
  • [14] D. Lee, T. Hoshi, T. Sogabe, Y. Miyatake, and S.-L. Zhang, Solution of the k-th eigenvalue problem in large-scale electronic structure calculations, Journal of Computational Physics, vol. 371, pp. 618–632, 2018.
  • [15] U. v. Luxburg, A tutorial on spectral clustering, Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007, http://dx.doi.org/10.1007/s11222-007-9033-z.
  • [16] S. Kamvar, D. Klein, and C. Manning, Spectral learning, in Proceedings of the 18th International Joint Conf. on Artificial intelligence. IJCAI’03, 2003, p. 561–566.
  • [17] R. Suganthi and S. Manimekalai, Spectral clustering based classification algorithm for text classification, International Journal of Engineering Science Invention (IJESI), pp. 36–41, 2018, http://www.ijesi.org/papers/NCIOT-2018/Volume-3/7.%2036-41.pdf.
  • [18] N. Rebagliati and A. Verri, Spectral clustering with more than K eigenvectors, Neurocomputing, vol. 74, no. 9, pp. 1391–1401, Apr 2011, https://doi.org/10.1016/j.neurocom.2010.12.08.
  • [19] S. Li and J. Hao, Spectral clustering-based semi-supervised sentiment classification, in Advanced Data Mining and Applications. ADMA 2012, S. Zhou, S. Zhang, and G. Karypis, Eds., vol. LNAI 7713. Springer-Verlag Berlin Heidelberg, 2012, pp. 271–283, https://doi.org/10.1007/978-3-642-35527-1_23.
  • [20] B. Liu, X. Shen, and W. Pan, Semi-supervised spectral clustering with application to detect population stratification, Frontiers in Genetics, p. Article 215, Oct 2013, https://doi.org/10.3389/fgene.2013.00215.
  • [21]W.-Y. Chen, Y. Song, H. Bai, C.-J. Lin, and E. Y. Chang, Parallel spectral clustering in distributed systems, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 3, pp. 568–586, 2011.
  • [22] D. Cai and X. Chen, Large scale spectral clustering via landmark-based sparse representation, IEEE Transactions on Cybernetics, vol. 45, no. 8, pp. 1669–1680, 2015.
  • [23] D. Huang, C.-D. Wang, J.-S. Wu, J.-H. Lai, and C.-K. Kwoh, Ultra-scalable spectral clustering and ensemble clustering, IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 6, pp. 1212–1226, 2020.
  • [24] D. Sitnik and I. Kopriva, Clustering and classification of low-dimensional data in explicit feature map domain: intraoperative pixel-wise diagnosis of adenocarcinoma of a colon in a liver, 2022.
  • [25] H. Li, X. Ye, A. Imakura, and T. Sakurai, Divide-and-conquer based large-scale spectral clustering, Neurocomputing, vol. 501, pp. 664–678, aug 2022. [Online]. Available: https://doi.org/10.1016.
  • [26] X. Hong, J. Gao, H. Wei, J. Xiao, and R. Mitchell, Two-step scalable spectral clustering algorithm using landmarks and probability density estimation,” Neurocomputing, vol. 519, pp. 173–186, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231222014497.
  • [27] H. Li, X. Ye, A. Imakura, and T. Sakurai, Lsec: Large-scale spectral ensemble clustering, 2021.
  • [28] C. Fowlkes, S. Belongie, F. Chung, and J. Malik, Spectral grouping using the Nyström method, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 2, 2004.
  • [29] C. Alzate and J. Suykens, A weighted kernel pca formulation with out-of-sample extensions for spectral clustering methods, in The 2006 IEEE International Joint Conference on Neural Network Proceedings, 2006, pp. 138–144.
  • [30] C. Alzate and J. A. K. Suykens, Out-of-sample eigenvectors in kernel spectral clustering, in The 2011 International Joint Conference on Neural Networks, 2011, pp. 2349–2356.
  • [31] S. Li, S. Ouyang, and Y. Liu, Understanding the generalization performance of spectral clustering algorithms, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37/7, 2023, pp. 8614–8621.
  • [32] K. D. Levin, F. Roosta-Khorasani, M. W. Mahoney, and C. E. Priebe, Out-of-sample extension of graph adjacency spectral embedding, in International Conference on Machine Learning, 2018.
  • [33] L. R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Chaspter 4. Englewood Cliffs, NJ: Prentice Hall, 1993.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-1cf5409c-3bda-4a7b-906f-f648d10d5960
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.