PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Low Distortion Embedding of the Hamming Space into a Sphere with Quadrance Metric and k-means Clustering of Nominal-continuous Data

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this article we propose a new clustering algorithm for combinations of continuous and nominal data. The proposed algorithm is based on embedding of the nominal data into the unit sphere with a quadrance metrics, and adaptation of the general k-means clustering algorithm for the embedding data. It is also shown that the distortion of new embedding with respect to the Hamming metrics is less than that of other considered possibilities. A series of numerical experiments on real and synthetic datasets show that the proposed algorithm provide a comparable alternative to other clustering algorithms for combinations of continuous and nominal data.
Wydawca
Rocznik
Strony
221--233
Opis fizyczny
Bibliogr. 16 poz., tab.
Twórcy
autor
  • University of Warmia and Mazury in Olsztyn
  • Faculty of Mathematics and Computer Science, Poland
autor
  • Warsaw School of Computer Science, Poland
Bibliografia
  • [1] Indyk P. Algorithmic Applications of Low-Distortion Geometric Embeddings. In: Proceedings of the 42Nd IEEE Symposium on Foundations of Computer Science, FOCS ’01. IEEE Computer Society, Washington, DC, USA. ISBN 0-7695-1390-5, 2001 pp. 10–33. doi:10.1109/SFCS.2001.959878. URL http://dl.acm.org/citation.cfm?id=874063.875596.
  • [2] Linial N. Finite metric spaces−combinatorics, geometry and algorithms. In: Ta-Tsien L (ed.), Proceedings of the International Congress of Mathematicians 2002, volume III. 2002 pp. 573–586.
  • [3] Indyk P, Matoušek J. Low-Distortion Embeddings of Finite Metric Spaces. In: Handbook of Discrete and Computational Geometry. CRC Press, 2004 pp. 177–196. doi:10.1201/9781420035315.ch8.
  • [4] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Springer, Berlin Heidelberg New York, 2001. ISBN-13: 978-0387952840.
  • [5] Krzanowski WJ. Principles of multivariate analysis: A user’s perspective. Oxford University Press, 2000. ISBN-0-198-52211-8.
  • [6] Grabowski M, Korpusik M. Metrics and similarities in modeling dependencies between continuous and nominal data. Zeszyty Naukowe WWSI, 2013;10(7):25–37. doi:10.1007/978-3-319-26227-7_2.
  • [7] Denisiuk A, Grabowski M. A Variant of the K-Means Clustering Algorithm for Continuous-Nominal Data. In: Burduk R, Jackowski K, Kurzyński M, Woźniak M, ˙ Zołnierek A (eds.), Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015, volume 403 of Advances in Intelligent Systems and Computing. Springer, 2016 pp. 17–26. doi:10.1007/978-3-319-26227-7n_2.
  • [8] Deza E, Deza MM. Encyclopedia of Distances. Springer-Verlag, Berlin Heidelberg, 2009. ISBN 978-3-642-00234-2. doi:10.1007/978-3-642-00234-2. URL https://books.google.de/books?id=LXEezzccwcoC.
  • [9] Pearson K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 1900. 50(302):157–175. doi:10.1080/14786440009463897. http://dx.doi.org/10.1080/14786440009463897, URL http://dx.doi.org/10.1080/14786440009463897.
  • [10] Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley, 1990. doi:10.1002/9780470316801.
  • [11] von Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007. 17(4):395–416. doi:10.1007/s11222-007-9033-z. URL http://dx.doi.org/10.1007/s11222-007-9033-z.
  • [12] Kannan R, Vempala S, Vetta A. On clusterings: Good, bad and spectral. J. ACM, 2004. 51(3):497–515. doi:10.1145/990308.990313. URL http://doi.acm.org/10.1145/990308.990313.
  • [13] Karayiannis NB, Randolph-Gips MM. Non-Euclidean c-means clustering algorithms. Intell. Data Anal., 2003. 7(5):405–425. URL http://content.iospress.com/articles/intelligent-data-analysis/ida00138.
  • [14] Asuncion A, Newman DJ. UCI Machine Learning Repository, 2007. URL http://www.ics.uci.edu/~mlearn/MLRepository.html.
  • [15] Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. cluster: Cluster Analysis Basics and Extensions, 2016. R package version 2.0.5 − For new features, see the ’Changelog’ file (in the package source).
  • [16] Karatzoglou A, Smola A, Hornik K, Zeileis A. kernlab – An S4 Package for Kernel Methods in R. Journal of Statistical Software, 2004. 11(9):1–20. URL http://www.jstatsoft.org/v11/i09/.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-ea091943-a9f8-4727-be8f-3963e7b3c69f
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.