Center-based l 1 -clustering method

Sabo, K.

doi:10.2478/amcs-2014-0012

Artykuł - szczegóły

Tytuł artykułu

Center-based l₁-clustering method

Autorzy

Sabo K.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/amcs-2014-0012

Warianty tytułu

Języki publikacji

Abstrakty

In this paper, we consider the l₁-clustering problem for a finite data-point set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l₂-clustering method, which is also known in the literature as a smooth k-means method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l₁-clustering algorithm is faster and gives significantly better results than the l₂-clustering algorithm.

Słowa kluczowe

l1 clustering data mining optimization weighted median problem

metoda grupowania eksploracja danych optymalizacja

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2014

Tom

Vol. 24, no. 1

Strony

151--163

Opis fizyczny

Bibliogr. 47 poz., rys., tab., wykr.

Twórcy

autor

Sabo K.

ksabo@mathos.hr

Department of Mathematics, University of Osijek, Trg Lj. Gaja 6, HR 31 000 Osijek, Croatia

Bibliografia

[1] Angulo, J. and Serra, J. (2007). Modelling and segmentation of colour images in polar representations, Image and Vision Computing 25(4): 475–495.
[2] Äyrämö, S. (2006). Knowledge Mining Using Robust Clustering, Ph.D. thesis, University of Jyväskylä, Jyväskylä.
[3] Bagirov, A.M. and Ugon, J. (2005). An algorithm for minimizing clustering functions, Optimization 54(4–5): 351–368.
[4] Bagirov, A.M., Ugon, J. and Webb, D. (2011). Fast modified global k-means algorithm for incremental cluster construction, Pattern Recognition 44(4): 886–876.
[5] Bezdek, J.C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers, Norwell, MA.
[6] Boyd, D.L. and Vandenberghe, L. (2004). Convex Optimization, Cambridge University Press, Cambridge.
[7] Chaovalitwongse, W.A., Butenko, S. and Pardalos, P.M., (Eds.) (2009). Clustering Challenges in Biological Networks, World Scientific, London.
[8] Choulakian, V. (2001). Robust q-mode principal component analysis in L1, Computational Statistics & Data Analysis, 37(2): 135–150.
[9] Clarke, F. H., (1990). Optimization and Nonsmooth Analysis, SIAM, Philadelphia, PA.
[10] Cominetti, R. and Michelot, C. (1997 ). Sufficient conditions for coincidence in l1-minisum multifacility location problems, Operations Research Letters 20(4): 179–185.
[11] Cord, A., Ambroise, C. and Cocquerez, J.-P. (2006 ). Feature selection in robust clustering based on Laplace mixture, Pattern Recognition Letters 27(6): 627–635.
[12] Cupec, R., Grbić, R., Sabo, K. and Scitovski, R. (2009). Three points method for searching the best least absolute deviations plane, Applied Mathematics and Computation 215(3): 983–994.
[13] Duda, R., Hart, P. and Stork, D. (2001). Pattern Classification, Wiley, New York, NY.
[14] Finkel, D.E. and Kelley, C.T. (2006). Additive scaling and the DIRECT algorithm, Journal of Global Optimization 36(4): 597–608.
[15] Floudas, C.A. and Gounaris, C.E. (2009). A review of recent advances in global optimization, Journal of Global Optimization 45(4): 3–38.
[16] Frąckiewicz, M. and Palus, H. (2011). KHM clustering techique as a segmentation method for endoscopic colour images, International Journal of Applied Mathematics and Computer Science 21(1): 203–209, DOI: 10.2478/v10006-011-0015-0.
[17] Gan, G., Ma, C. and Wu, J. (2007). Data Clustering: Theory, Algorithms, and Applications, SIAM, Philadelphia, PA.
[18] Grbić, R., Nyarko, E.K. and Scitovski, R. (2012). A modification of the direct method for Lipschitz global optimization for a symmetric function, Journal of Global Optimization, 57(4): 1193–1212, DOI: 10.1007/s10898-012-0020-3.
[19] Grbić, R., Scitovski, K., Sabo, K. and Scitovski, R. (2013). Approximating surfaces by the moving least absolute deviations method, Applied Mathematics and Computation 219(9): 4387–4399.
[20] Gurwitz, C. (1990). Weighted median algorithms for l1 approximation, BIT 30(2): 301–310.
[21] Hathaway, R.J. and Bezdek, J.C. (2001). Fuzzy c-means clustering of incomplete data, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 31(5): 735–744.
[22] Hubert, L. and Arabie, P. (1985). Comparing partitions, Journal of Classification 2(1): 193–218.
[23] Jain, A. (2010). 50 years beyond k-means, Pattern Recognition Letters 31(8): 651–666.
[24] Jajuga, K. (1987). A clustering method based on the L1-norm, Computational Statistics & Data Analysis 5(4): 357–371.
[25] Jajuga, K. (1991). L1-norm based fuzzy clustering, Fuzzy Sets and Systems 39(1): 43–50.
[26] Iyigun, C. (2007). Probabilistic Distance Clustering, Ph.D. thesis, Graduate School, Rutgers, New Brunswick, NJ.
[27] Jones, D.R., Perttunen, C.D. and Stuckman, B.E. (1993). Lipschitzian optimization without the Lipschitz constant, Journal of Optimization Theory and Applications 79(1): 157–181.
[28] Jörnsten, R. (2004). Clustering and classification based on the L1 data depth, Journal of Multivariate Analysis 90(1): 67–89.
[29] Kogan, J. (2007). Introduction to Clustering Large and High-Dimensional Data, Cambridge University Press, Cambridge.
[30] Leisch, F. (2006). A toolbox for k-centroids cluster analysis, Computational Statistics & Data Analysis 51(2): 526–544.
[31] Li, X. Hu, W., Wang, H. and Zhang, Z. (2010). Linear discriminant analysis using rotational invariant L1 norm, Neurocomputing 73(13–15): 2571–2579.
[32] Scitovski, R. and Scitovski, S. (2013). A fast partitioning algorithm and its application to earthquake investigation, Computers and Geosciences 59(1): 124–131.
[33] Simiński, K. (2012). Neuro-rough-fuzzy approach for regression modelling from missing data, International Journal of Applied Mathematics and Computer Science 22(2): 461–476, DOI: 10.2478/v10006-012-0035-4.
[34] Späth, H. (1976). L1-cluster analysis, Computing 16(4): 379–387.
[35] Späth, H. (1987). Using the L1-norm within cluster analysis, in Y. Dodge (Ed.), Proceedings of the First International Conference on Statistical Data Analysis Based on the L1-Norm and Related Methods, University of Neuchatel/Switzerland, August 31–September 04, 1987, Elsevier, Amsterdam, pp. 427–434.
[36] Malinen, M.I. and Fränti, P. (2012). Clustering by analytic functions, Information Sciences 217(1): 31–38.
[37] Meng, D., Zhao, Q and Xu, Z. (2012). Improve robustness of sparse PCA by L1-norm maximization, Pattern Recognition 45(1): 487–497.
[38] Pintér, J.D. (1996). Global Optimization in Action (Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications), Kluwer Academic Publishers, Dordrecht.
[39] Ruszczynski, A (2006). Nonlinear Optimization, Princeton University Press, Princeton/Oxford, NJ.
[40] Sabo, K. and Scitovski, R. (2008). The best least absolute deviations line—properties and two efficient methods, ANZIAM Journal 50(2): 185–198.
[41] Sabo, K., Scitovski, R. and Vazler, I. (2011). Searching for a best LAD-solution of an overdetermined system of linear equations motivated by searching for a best LAD-hyperplane on the basis of given data, Journal of Optimization Theory and Applications 149(2): 293–314.
[42] Sabo, K., Scitovski, R. and Vazler, I. (2012). One-dimensional center-based l1-clustering method, Optimization Letters 7(1): 5–22.
[43] Sabo, K., Scitovski, R., Vazler, I. and Zekić-Sušac, M. (2011). Mathematical models of natural gas consumption, Energy Conversion and Management 52(3): 1721–1727.
[44] Teboulle, M. (2007). A unified continuous optimization framework for center-based clustering methods, Journal of Machine Learning Research 8(1): 65–102.
[45] Vardi, Y., Zhang, C. H. (2000). The multivariate L1-median and associated data depth, Proceedings of the National Academy of Sciences, United States of America 97(4): 1423–1426.
[46] Vazler, I., Sabo, K. and Scitovski, R. (2012). Weighted median of the data in solving least absolute deviations problems, Communications in Statistics—Theory and Methods 41(8): 1455–1465.
[47] Zhang, J., Peng, L., Zhao, X. and Kuruoglu E.E. (2012 ). Robust data clustering by learning multi-metric lq-norm distances, Expert Systems with Applications 39(1): 335–349.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-bde3d114-3c75-4484-aeaa-5d414ffd0125

Center-based l1-clustering method

Center-based l₁-clustering method