A feasible k-means kernel trick under non-Euclidean feature space

Kłopotek, Robert; Kłopotek, Mieczysław; Wierzchoń, Sławomir

doi:10.34768/amcs-2020-0052

Artykuł - szczegóły

Tytuł artykułu

A feasible k-means kernel trick under non-Euclidean feature space

Autorzy

Kłopotek Robert , Kłopotek Mieczysław , Wierzchoń Sławomir

Treść / Zawartość

Pełne teksty:

08_klopotek_klopotek_wierzchon_a_feasible_k_means_kernel_trick_2020_4.pdf

Pobierz

Identyfikatory

DOI

10.34768/amcs-2020-0052

Warianty tytułu

Języki publikacji

Abstrakty

This paper poses the question of whether or not the usage of the kernel trick is justified. We investigate it for the special case of its usage in the kernel k-means algorithm. Kernel-k-means is a clustering algorithm, allowing clustering data in a similar way to k-means when an embedding of data points into Euclidean space is not provided and instead a matrix of “distances” (dissimilarities) or similarities is available. The kernel trick allows us to by-pass the need of finding an embedding into Euclidean space. We show that the algorithm returns wrong results if the embedding actually does not exist. This means that the embedding must be found prior to the usage of the algorithm. If it is found, then the kernel trick is pointless. If it is not found, the distance matrix needs to be repaired. But the reparation methods require the construction of an embedding, which first makes the kernel trick pointless, because it is not needed, and second, the kernel-k-means may return different clusterings prior to repairing and after repairing so that the value of the clustering is questioned. In the paper, we identify a distance repairing method that produces the same clustering prior to its application and afterwards and does not need to be performed explicitly, so that the embedding does not need to be constructed explicitly. This renders the kernel trick applicable for kernel-k-means.

Słowa kluczowe

kernel method k-means non-Euclidean feature space Gower and Legendre theorem

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2020

Tom

Vol. 30, no. 4

Strony

703--715

Opis fizyczny

Bibliogr. 47 poz.

Twórcy

autor

Kłopotek Robert

r.klopotek@uksw.edu.pl

Faculty of Mathematics and Natural Sciences, Cardinal Stefan Wyszyński University in Warsaw, ul. Wóycickiego 1/3, 01-938 Warsaw, Poland

autor

Kłopotek Mieczysław

klopotek@ipipan.waw.pl

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland

autor

Wierzchoń Sławomir

stw@ipipan.waw.pl

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warsaw, Poland

Bibliografia

[1] Ackerman, M., Ben-David, S. and Loker, D. (2010). Towards property-based classification of clustering paradigms, in J. Lafferty et al. (Eds), Advances in Neural Information Processing Systems 23, Curran Associates, Red Hook, NY, pp. 10–18.
[2] Balaji, R. and Bapat, R. (2007). On Euclidean distance matrices, Linear Algebra and Its Applications 424(1): 108–117.
[3] Bao, T. and Kadobayashi, Y. (2008). On tighter inequalities for efficient similarity search in metric spaces, IAENG International Journal of Computer Science 35(3): IJCS_35_3_17.
[4] Bradley, P.S., Mangasarian, O.L. and Street, W.N. (1996). Clustering via concave minimization, Proceedings of the 9th International Conference on Neural Information Processing Systems, NIPS’96, Denver, CO, USA, pp. 368–374.
[5] Cailliez, F. (1983). The analytical solution of the additive constant problem, Psychometrika 48(2): 305–308.
[6] Chitta, R., Jin, R., Havens, T. and Jain, A. (2011). Approximate kernel k-means: Solution to large scale kernel clustering, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’11, San Diego, CA, USA, pp. 895–903.
[7] Choi, H. and Choi, S. (2005). Kernel Isomap on noisy manifold, 4th International Conference on Development and Learning, Osaka, Japan, pp. 208–213.
[8] Cox, T.F. and Cox, M.A.A. (2001). Multidimensional Scaling, 2nd Edn, Chapman & Hall, London.
[9] Demontis, A., Melis, M., Biggio, B., Fumera, G. and Roli, F. (2017). Super-sparse learning in similarity spaces, CoRR: abs/1712.06131.
[10] Dokmanic, I., Parhizkar, R., Ranieri, J. and Vetterli, M. (2015). Euclidean distance matrices: A short walk through theory, algorithms and applications, CoRR: abs/1502.07541.
[11] Du, L., Zhou, P., Shi, L., Wang, H., Fan, M., Wang, W. and Shen, Y. (2015). Robust multiple kernel k-means using l21-norm, Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, Buenos Aries, Argentina, pp. 3476–3482.
[12] Gisbrecht, A. and Schleif, F.-M. (2015). Metric and non-metric proximity transformations at linear costs, Neurocomputing 167(1): 643–657.
[13] Gonzalez, J. and Munoz, A. (2010). Representing functional data in reproducing kernel Hilbert spaces with applications to clustering and classification, Statistics and Econometrics Series 013, Working paper 10-27.
[14] Gower, J.C. (1982). Euclidean distance geometry, The Mathematical Scientist 7: 1–14.
[15] Gower, J.C. (1985). Properties of Euclidean and non-Euclidean distance matrices, Linear Algebra and Its Applications 67: 81–97.
[16] Gower, J. and Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients, Journal of Classification 3(1): 5–48.
[17] Handhayania, T. and Hiryantob, L. (2015). Intelligent kernel k-means for clustering gene expression, International Conference on Computer Science and Computational Intelligence (ICCSCI 2015), Jakarta, Indonesia, Vol. 59, pp. 171–177.
[18] Higham, N.J. (1988). Computing a nearest symmetric positive semidefinite matrix, Linear Algebra and Its Applications 103: 103–118.
[19] Hofmann, T., Schölkopf, B. and Smola, A.J. (2008). Kernel methods in machine learning, Annals of Statistics 36(3): 1171–1220.
[20] Jacobs, D., Weinshall, D. and Gdalyahu, Y. (2000). Classification with nonmetric distances: Image retrieval and class representation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(6): 583–600.
[21] Jain, A. and Zongker, D. (1998). Representation and recognition of handwritten digits using deformable templates, IEEE Transactions on Pattern Analysis and Machine Intelligence 19(12): 1386–1390.
[22] Jaworski, M. (2018). Regression function and noise variance tracking methods for data streams with concept drift, International Journal of Applied Mathematics and Computer Science 28(3): 559–567, DOI: 10.2478/amcs-2018-0043.
[23] Kashima, H., Hu, J., Ray, B. and Singh, M. (2008). K-means clustering of proportional data using l1 distance, Proceedings of the 19th International Conference on Pattern Recognition, Tampa, FL, USA.
[24] Kleinberg, J. (2002). An impossibility theorem for clustering, Proceedings of the 16th Neural Information Processing Systems Conference, NIPS 2002, Vancouver, BC, Canada, pp. 446–453.
[25] Kłopotek, M. (2019). On the existence of kernel function for kernel-trick of k-means in the light of Gower theorem, Fundamenta Informaticae 168(1): 25–43.
[26] Legendre, P. and Legendre, L. (1998). Numerical Ecology, 2nd Edn, Elsevier, Amsterdam.
[27] Li, C., Georgiopoulos, M. and Anagnostopoulos, G.C. (2013). Kernel-based distance metric learning in the output space, International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA, pp. 1–8.
[28] Lingoes, J. (1971). Some boundary conditions for a monotone analysis of symmetric matrices, Psychometrika 36(2): 195–203.
[29] Loosli, G., Canu, S. and Ong, C. (2016). Learning SVMin Krein spaces, IEEE Transactions on Pattern Analysis and Machine Intelligence 38(6): 1204–1216.
[30] Marin, D., Tang, M., Ayed, I.B. and Boykov, Y. (2019). Kernel clustering: Density biases and solutions, IEEE Transactions on Pattern Analysis and Machine Intelligence 41(1): 136–147.
[31] Marteau, P.-F. (2019). Times series averaging and denoising from a probabilistic perspective on time-elastic kernels, International Journal of Applied Mathematics and Computer Science 29(2): 375–392, DOI: 10.2478/amcs-2019-0028.
[32] Pan, V.Y. and Chen, Z.Q. (1999). The complexity of the matrix eigenproblem, Proceedings of the 31st Annual ACM Symposium on Theory of Computing, STOC’99, Atlanta, GA, USA, pp. 507–516.
[33] Pękalska, E. and Duin, R. (2005). The Dissimilarity Representation for Pattern Recognition: Foundations and Applications (Machine Perception and Artificial Intelligence), World Scientific, River Edge, NJ.
[34]Pękalska, E., Paclík, P. and Duin, R. (2002). A generalized kernel approach to dissimilarity-based classification, Journal of Machine Learning Research 2: 175–211.
[35] Qi, H. (2016). A convex matrix optimization for constant problem in multidimensional scaling with application to LLE, SIAM Journal on Optimization 26(4): 2564–2590.
[36] Richter, R., Kyprianidis, J., Springborn, B. and Alexa, M. (2017). Constrained modelling of 3-valent meshes using a hyperbolic deformation metric, Computer Graphics Forum 36(6): 62–75.
[37] Roth, V., Laub, J., Kawanabe, M. and Buhmann, J. (2003). Optimal cluster preserving embedding of nonmetric proximity data, IEEE Transactions on Pattern Analysis and Machine Intelligence 25(12): 1540–1551.
[38] Sarma, T., Vishwanath, P. and Reddy, B. (2013). Single pass kernel k-means clustering method, Sadhana 38(3): 407–419.
[39] Schleif, F. and Tino, P. (2015). Indefinite proximity learning: A review, Neural Computation 27(10): 2039–2096.
[40] Schoenberg, I.J. (1938). Metric spaces and positive definite functions, Transactions of the American Mathematical Society 44(3): 522–536.
[41] Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge.
[42] Szalkai, B. (2013). An implementation of the relational k-means algorithm, CoRR: abs/1304.6899.
[43] Szekely, G. and Rizzo, M. (2014). Partial distance correlation with methods for dissimilarities, CoRR: abs/1310.2926.
[44] Tzortzis, G. and Likas, A.C. (2009). The global kernel k-means algorithm for clustering in feature space, IEEE Transactions on Neural Networks 7(20): 1181–94.
[45] Villmann, T., Kaden, M., Nebel, D. and Bohnsack, A. (2016). Similarities, dissimilarities and types of inner products for data analysis in the context of machine learning, in L. Rutkowski et al. (Eds), Artificial Intelligence and Soft Computing: 15th International Conference, ICAISC 2016, Springer, Cham, pp. 125–133.
[46] Wierzchoń, S. and Kłopotek, M. (2018). Modern Clustering Algorithms, Springer, Cham.
[47] Yao, Y. and Chen, H. (2018). Multiple kernel k-means clustering by selecting representative kernels, CoRR: abs/1811.00264.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-46ff5af2-4bf8-4a42-8a67-91c5e399828b