Hashtag Discernability - Competitiveness Study of Graph Spectral and Other Clustering Methods

Starosta, Bartłomiej; Kłopotek, Mieczysław; Wierzchoń, Slawomir T.; Czerski, Dariusz

doi:10.15439/2023F2398

Artykuł - szczegóły

Tytuł artykułu

Hashtag Discernability - Competitiveness Study of Graph Spectral and Other Clustering Methods

Autorzy

Starosta Bartłomiej , Kłopotek Mieczysław , Wierzchoń Slawomir T. , Czerski Dariusz

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2023F2398

Warianty tytułu

Języki publikacji

Abstrakty

Spectral clustering methods are claimed to possess ability to represent clusters of diverse shapes, densities etc. They constitute an approximation to graph cuts of various types (plain cuts, normalized cuts, ratio cuts). They are applicable to unweighted and weighted similarity graphs. We perform an evaluation of these capabilities for clustering tasks of increasing complexity.

Słowa kluczowe

computer science clustering method clustering algorithm complexity theory task analysis

informatyka algorytm klastrowania teoria złożoności analiza zadań

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2023

Tom

Vol. 35

Strony

759--767

Opis fizyczny

Bibliogr. 24 poz., tab., wykr.

Twórcy

autor

Starosta Bartłomiej

b.starosta@ipipan.waw.pl

nstitute of Computer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warsaw, Poland

autor

Kłopotek Mieczysław

klopotek@ipipan.waw.pl

nstitute of Computer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warsaw, Poland

autor

Wierzchoń Slawomir T.

stw@ipipan.waw.pl

nstitute of Computer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warsaw, Poland

autor

Czerski Dariusz

d.czerski@ipipan.waw.pl

nstitute of Computer Science, Polish Academy of Sciences ul. Jana Kazimierza 5, 01-248 Warsaw, Poland

Bibliografia

1. S. T. Wierzchoń and M. A. Kłopotek, Modern Clustering Algorithms, ser. Studies in Big Data. Springer Verlag, 2018, vol. 34. ISBN 978-3-319-69307-1. http://dx.doi.org/https://doi.org/10.1007/978-3-319-69308-8
2. P. Łoziński, D. Czerski, and M. A. Kłopotek, “Grammatical case based IS-A relation extraction with boosting for polish,” in Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, FedCSIS 2016, Gdańsk, Poland, September 11-14, 2016, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, and M. Paprzycki, Eds., vol. 8. IEEE, 2016. http://dx.doi.org/10.15439/2016F391 pp. 533–540. [Online]. Available: https://doi.org/10.15439/2016F391
3. J. Dörpinghaus, S. Schaaf, J. Fluck, and M. Jacobs, “Document clustering using a graph covering with pseudostable sets,” in Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, FedCSIS 2017, Prague, Czech Republic, September 3-6, 2017, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, and M. Paprzycki, Eds., vol. 11, 2017. http://dx.doi.org/10.15439/2017F84 pp. 329–338. [Online]. Available: https://doi.org/10.15439/2017F84
4. P. Borkowski, M. A. Kłopotek, B. Starosta, S. T. Wierzchoń, and M. Sydow, “Eigenvalue based spectral classification,” PLoS ONE, vol. 18, no. 4, p. e0283413, 2023. http://dx.doi.org/https://doi.org/10.1371/journal.pone.0283413
5. I. S. Dhillon and D. S. Modha, “Concept decompositions for large sparse text data using clustering,” Machine Learning, vol. 42, no. 1, pp. 143–175, Jan 2001. http://dx.doi.org/https://doi.org/10.1023/A:1007612920971
6. S. T. Wierzchoń and M. A. Kłopotek, “Spectral cluster maps versus spectral clustering,” in Computer Information Systems and Industrial Management, ser. LNCS, vol. 12133. Springer, 2020. http://dx.doi.org/10.1007/978-3-030-47679-3 40 pp. 472–484.
7. C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. Fernández del Rı́o, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, and T. E. Oliphant, “Array programming with NumPy,” Nature, vol. 585, p. 357–362, 2020. http://dx.doi.org/10.1038/s41586-020-2649-2 https://numpy.org.
8. P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, and SciPy 1.0 Contributors, “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python,” Nature Methods, vol. 17, pp. 261–272, 2020. http://dx.doi.org/10.1038/s41592-019-0686-2 https://scipy.org.
9. L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. VanderPlas, A. Joly, B. Holt, and G. Varoquaux, “API design for machine learning software: experiences from the scikit-learn project,” in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 2013. http://dx.doi.org/10.48550/arXiv.1309.023 pp. 108–122, https://scikit-learn.org.
10. H. Kim and H. K. Kim, “clustering4docs github repository,” 2020, https://pypi.org/project/soyclustering/. [Online]. Available: https://github.com/lovit/clustering4docs
11. H. Kim, H. K. Kim, and S. Cho, “Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling,” Expert Systems with Applications, vol. 150, p. 113288, 2020. http://dx.doi.org/https://doi.org/10.1016/j.eswa.2020.113288. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417420301135
12. U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007. http://dx.doi.org/https://doi.org/10.48550/arXiv.0711.0189
13. P. Macgregor and H. Sun, “A tighter analysis of spectral clustering, and beyond,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162. PMLR, 17–23 Jul 2022. http://dx.doi.org/https://doi.org/10.48550/arXiv.2208.01724 pp. 14 717–14 742. [Online]. Available: https://proceedings.mlr.press/v162/macgregor22a.html
14. Y. Xu, A. Srinivasan, and L. Xue, A Selective Overview of Recent Advances in Spectral Clustering and Their Applications. Cham: Springer International Publishing, 2021, pp. 247–277. ISBN 978-3-030-72437-5. http://dx.doi.org/10.1007/978-3-030-72437-5_12
15. C. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008. http://dx.doi.org/https://doi.org/10.1017/CBO9780511809071
16. F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborov, and P. Zhang, “Spectral redemption in clustering sparse networks,” in Proc. the National Academy of Sciences, vol. 110 2013. http://dx.doi.org/10.48550/arXiv.1306.5550 pp. 20 935–20 940.
17. H. T. Ali and R. Couillet, “Improved spectral community detection in large heterogeneous networks,” Journal of Machine Learning Research, vol. 18, no. 225, pp. 1–49, 2018. [Online]. Available: http://jmlr.org/papers/v18/17-247.html
18. A. Saade, F. Krzakala, and L. Zdeborová, “Spectral clustering of graphs with the bethe hessian,” 2014. [Online]. Available: https://arxiv.org/abs/1406.1880. http://dx.doi.org/10.48550/ARXIV.1406.1880
19. Y. Endo and S. Miyamoto, “Spherical k-means++ clustering,” in Modeling Decisions for Artificial Intelligence, V. Torra and T. Narukawa, Eds. Cham: Springer International Publishing, 2015. http://dx.doi.org/https://doi.org/10.1007/978-3-319-23240-9 9. ISBN 978-3-319-23240-9 pp. 103–114.
20. S. Ji, D. Xu, L. Guo, M. Li, and D. Zhang, “The seeding algorithm for spherical k-means clustering with penalties,” J. Comb. Optim., vol. 44, no. 3, p. 1977–1994, oct 2022. http://dx.doi.org/10.1007/s10878-020-00569-1. [Online]. Available: https://doi.org/10.1007/s10878-020-00569-1
21. J. Knittel, S. Koch, and T. Ertl, “Efficient sparse spherical k-means for document clustering,” in Proceedings of the 21st ACM Symposium on Document Engineering, DocEng ’21. ACM, New York, NY, United States, 2021. http://dx.doi.org/https://doi.org/10.1145/3469096.3474937 pp. 1–4.
22. R. Pratap, A. Deshmukh, P. Nair, and T. Dutt, “A faster sampling algorithm for spherical k-means,” in Proceedings of The 10th Asian Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Zhu and I. Takeuchi, Eds., vol. 95. PMLR, 14–16 Nov 2018, pp. 343–358. [Online]. Available: https://proceedings.mlr.press/v95/pratap18a.html
23. R. A. Kłopotek, M. A. Kłopotek, and S. T. Wierzchoń, “A feasible k-means kernel trick under non-euclidean feature space,” International Journal of Applied Mathematics and Computer Science, vol. 30, no. 4, pp. 703–715, 2020. http://dx.doi.org/https://doi.org/10.34768/amcs-2020-0052 Online publication date: 1-Dec-2020.
24. J. Gower, “Some distance properties of latent root and vector methods used in multivariate analysis,” Biometrika, vol. 53(3-4), pp. 325—-338, 1966. http://dx.doi.org/https://doi.org/10.1093/biomet/53.3-4.325

Uwagi

1. Thematic Tracks Regular Papers

2. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-49a857f4-749a-491a-ba85-de83d91435d0