PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

On the Consistency of k-means++ algorithm

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
We prove in this paper that the expected value of the objective function of the k-means++ algorithm for samples converges to population expected value. As k-means++, for samples, provides with constant factor approximation for k-means objectives, such an approximation can be achieved for the population with increase of the sample size. This result is of potential practical relevance when one is considering using subsampling when clustering large data sets (large data bases).
Wydawca
Rocznik
Strony
361--377
Opis fizyczny
Bibliogr. 12 poz., rys.
Twórcy
  • Institute of Computer Science of the Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248 Warszawa, Poland
Bibliografia
  • [1] MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., volume 1. Univ. of Calif. Press, 1967 pp. 281-297. URL https://projecteuclid.org/euclid.bsmsp/1200512992.
  • [2] Wierzchoń S, Kłopotek M. Modern Clustering Algorithms. Springer Verlag Series: Studies in Big Data 34. Springer Verlag, 2018.
  • [3] Arthur D, Vassilvitskii S. k-means++: the advantages of careful seeding. In: Bansal N, Pruhs K, Stein C (eds.), Proc. of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007. SIAM, New Orleans, Louisiana, USA, 2007 pp. 1027-1035. ISBN: 978-0-898716-24-5.
  • [4] Bejarano J, Bose K, Brannan T, Thomas A, Adragni K, Neerchal NK. Sampling Within k-Means Algorithm to Cluster Large Datasets. Technical Report HPCF-2011-12, High Performance Computing Facility, University of Maryland, Baltimore County, 2011. doi:10.2172/1025410.
  • [5] Pollard D. Strong Consistency of K-Means Clustering. Ann. Statist., 1981. 9(1):135-140. URL https://www.jstor.org/stable/2240876.
  • [6] Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Piatko CD, Silverman R, Wu AY. A Local Search Approximation Algorithm for k-Means Clustering. In: 18th Annual ACM Symposium on Computational Geometry (SoCG’02), June 2002, Barcelona, Spain. 2002 pp. 10-18. doi:10.1145/513400.513402.
  • [7] Tang C, Monteleoni C. Scalable constant k-means approximation via heuristics on well-clusterable data. In: Poster Session of Learning faster from easy data II conference, Montreal, Canada.
  • [8] Kumar A, Sabharwal Y, Sen S. A simple linear time approximation algorithm for geometric k-means problem in any dimension. In: FOCS 2004. 2004 pp. 454-462. doi:10.1109/FOCS.2004.7.
  • [9] Balcan MF, Blum A, Gupta A. Approximate Clustering Without the Approximation. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’09. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2009 pp. 1068-1077. URL http://dl.acm.org/citation.cfm?id=1496770.1496886.
  • [10] Terada Y. Strong consistency of reduced K-means clustering. Scand. J. Stat., 2014. 41(4):913-931. URL https://doi.org/10.1111/sjos.12074.
  • [11] Mahajan M, Nimbhorkar P, Varadarajan K. The Planar k-Means Problem is NP-Hard. In: Proceedings of the 3rd International Workshop on Algorithms and Computation, WALCOM ’09. Springer-Verlag, Berlin, Heidelberg. 2009 pp. 274-285, ISBN: 978-3-642-00201-4.
  • [12] Bachem O, Lucic M, Hassani SH, Krause A. Approximate K-means++ in Sublinear Time. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16. AAAI Press, 2016 pp. 1459-1467.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-bdef7c64-4fcb-426a-be57-684f7af2f589
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.