A Probabilistic Component for K-Means Algorithm and its Application to Sound Recognition

Olszewski, D.; Kołodziej, M.; Twardy, M.

Artykuł - szczegóły

Tytuł artykułu

A Probabilistic Component for K-Means Algorithm and its Application to Sound Recognition

Autorzy

Olszewski D. , Kołodziej M. , Twardy M.

Wybrane pełne teksty z tego czasopisma

http://pe.org.pl/

Identyfikatory

Warianty tytułu

Komponent probabilistyczny dla algorytmu K-średnich i jego zastosowanie w rozpoznawaniu dźwięku

Języki publikacji

Abstrakty

In this paper, we present a novel approach to building of a probabilistic model of the data set, which is further used by the K-means clustering algorithm. Considering K-means with respect to the probabilistic model, requires incorporating of a probabilistic distance, which provides us with measure of similarity between two probability distributions, as the distance measure. We use various kinds of probabilistic distances in order to evaluate their effectiveness when applied to the algorithm with the proposed model of the analyzed data. Further, we report the results of experiments with the discussed clustering algorithm in the field of sound recognition and choose these probabilistic distances, which correspond to the highest clustering performance. As a reference technique, we used the traditional K-means algorithm with the most commonly employed Euclidean distance. Our experiments have shown that the presented method outperforms the traditional K-means algorithm, regardless of the statistical distance applied.

W niniejszej pracy zaprezentowano nowy sposób budowy probabilistycznego modelu zbioru danych, analizowanych przez algorytm klasteryzacji K-średnich. Rozważanie metody K-średnich w odniesieniu do modelu probabilistycznego, narzuca wymaganie wykorzystania odległości probabilistycznej, będącej miarą podobieństwa pomiędzy dwoma rozkładami prawdopodobieństwa, jako miary odległości w algorytmie. W pracy wykorzystano różne typy odległości probabilistycznych, w celu oceny skuteczności ich zastosowania w algorytmie z proponowanym modelem analizowanych danych. Przedstawione zostały również wyniki badań omawianego algorytmu w dziedzinie rozpoznawania dźwięku. Jako punkt odniesienia wykorzystany został tradycyjny algorytm K-średnich z najczęściej stosowaną odległością Euklidesa. Wyniki przeprowadzonych badań pozwalają stwierdzić, iż zaprezentowana metoda umożliwia osiągnięcie lepszych rezultatów klasteryzacji niż klasyczny algorytm K-średnich, w przypadku każdej zastosowanej odległości statystycznej.

Słowa kluczowe

k-means clustering probabilistic distance sound recognition discrete Fourier transform

klasteryzacja k-średnich odległość probabilistyczna rozpoznawanie dźwięku dyskretne przekształcenie Fouriera

Wydawca

Wydawnictwo SIGMA-NOT

Czasopismo

Przegląd Elektrotechniczny

Rocznik

2010

Tom

R. 86, nr 6

Strony

185--190

Opis fizyczny

Bibliogr. 35 poz., rys., tab.

Twórcy

autor

Olszewski D.

autor

Kołodziej M.

autor

Twardy M.

Warsaw University of Technology, olszewsd@ee.pw.edu.pl

Bibliografia

[1] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1967, pp. 281–297.
[2] C. Chinrungrueng and C.H. Sequin, “Optimal Adaptive KMeans Algorithm with Dynamic Adjustment of Learning Rate,” IEEE Transactions on Neural Networks, vol. 6, no. 1, pp. 157–169, January 1995.
[3] S.K. Gupta, K.S. Rao, and V. Bhatnagar, “K-Means Clustering Algorithm for Categorical Attributes,” in Data Warehousing and Knowledge Discovery (DaWaK-99), 1999, pp. 203–208.
[4] T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu, “An Efficient k-Means Clustering Algorithm: Analysis and Implemetation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 881–892, July 2002.
[5] M. Laszlo and S. Mukherjee, “A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 533–543, April 2006.
[6] G. Biau, L. Devroye, and G. Lugosi, “On the Performance of Clustering in Hilbert Spaces,” IEEE Transactions on Information Theory, vol. 54, no. 2, pp. 781–790, February 2008.
[7] H. Xiong, J. Wu, and J. Chen, “K-Means Clustering Versus Validation Measure: A Data-Distribution Perspective,” IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 39, no. 2, pp. 318–331, April 2009.
[8] H. Chen, J.H. Wang, and X.Z. Wang, “A New Similarity Measure Based on Feature Weight Learning,” in Proceedings of the Second International Conference on Machine Learning and Cybernetics, 2003, pp. 33–36.
[9] J.Z. Huang, M.K. Ng, H. Rong, and Z. Li, “Automated Variable Weighting in k-Means Type Clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 657–668, May 2005.
[10] L. Jing, M.K. Ng, and J.Z. Huang, “An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 8, pp. 1026–1041, August 2007.
[11] D. E. Rumelhart and D. Zipser, “Feature Discovery by Competitive Learning,” Cognitive Science, vol. 9, no. 1, pp. 75–112, January-March 1985.
[12] D. DeSieno, “Adding a Conscience to Competitive Learning,” in Proceedings of the Second IEEE International Conference on Neural Networks (ICNN-88), vol. 1. IEEE, July 1988, pp. 117–124.
[13] S.R. Gaddam, V.V. Phoha, and K.S. Balagani, “K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp. 345–354, March 2007.
[14] K. Krishna and M.N. Murty, “Genetic K-Means Algorithm,” IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 29, no. 3, pp. 433–439, June 1999.
[15] M. Xu and P. Franti, “A Heuristic K-Means Clustering Algorithm by Kernel PCA,” in Proceedings of the International Conference on Image Processing (ICIP), 2004, pp. 3503–3506.
[16] M. Teboulle, P. Berkhin, I. Dhillon, Y. Guan, and J. Kogan, “Clustering with Entropy-Like k-Means Algorithms,” in Grouping Multidimensional Data. Springer Berlin Heidelberg, 2006, pp. 127–160.
[17] I. Csiszar, “Information-Type Measure of Difference of Probability Distributions and Indirect Observations,” Studia Scientiarum Matematicarum Hungar, vol. 2, pp. 299–318, 1967.
[18] F. Liese and I. Vajda, Convex Statistical Distances. Teubner, 1987.
[19] W. Zhong, G. Altun, R. Harrison, P.C. Tai, and Y. Pan, “Improved K-Means Clustering Algorithm for Exploring Local Protein Sequence Motifs Representing Common Structural Property,” IEEE Transactions on Nanobioscience, vol. 4, no. 3, pp. 255–265, September 2005.
[20] M. Basseville, “Distance Measures for Signal Processing and Pattern Recognition,” Signal Processing, vol. 18, no. 4, pp. 349–369, December 1989.
[21] D. Pollard. (2000) Asymptopia. Book in progress. [Online]. Available: http://www.stat.yale.edu/ pollard/Books/Asymptopia/
[22] A.L. Gibbs and F. E. Su, “On Choosing and Bounding Probability Metrics,” International Statistical Review, vol. 70, no. 3, pp. 419–435, February 2002.
[23] L. Birge, “Non-Asymptotic Minimax Risk for Hellinger Balls,” Probability and Mathematical Statistics, vol. 5, no. 1, pp. 21–29, 1985.
[24] H. Sengar, H. Wang, D. Wijesekera, and S. Jajodia, “Detecting VoIP Floods Using the Hellinger Distance,” IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 6, pp. 794–805, June 2008.
[25] M. Fannes and P. Spincemaille, “The Mutual Affinity of Random Measures,” eprint arXiv:math-ph/0112034v1, December 2001.
[26] N. Salkind, Encyclopedia of Measurement and Statistics. Thousend Oaks (CA): Sage, 2007.
[27] V.M. Zolotarev, “Probability Metrics,” Theory of Probability and its Applications, vol. 28, no. 2, pp. 278–302, January 1984.
[28] A.A. Borovkov, Mathematical Statistics. Gordon and Breach Science Publishers, 1998.
[29] P. Diaconis and S. L. Zabell, “Updating Subjective Probability,” Journal of the American Statistical Association, vol. 77, no. 380, pp. 822–830, December 1982.
[30] S. Kullback and R. A. Leibler, “On Information and Sufficiency,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951.
[31] C. E. Shannon, “A Mathematical Theory of Communication,” The Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, July, October 1948.
[32] T.M. Cover and J.A. Thomas, Elements of Information Theory. John Wiley & Sons, 1991.
[33] D. Gabor, “Theory of Communication,” Journal of Institute of Electrical Engineering, vol. 93, no. 26, pp. 429–457, November 1946.
[34] J.B. Allen and L. R. Rabiner, “A Unified Approach to Short-Time Fourier Analysis and Synthesis,” in Proceedings of the IEEE, vol. 65, no. 11, November 1977, pp. 1558–1564.
[35] J.B. Allen, “Short-Term Spectral Analysis and Synthesis and Modifications by Discrete Fourier Transform,” IEEE Transactions on Acoustics, Speech, Signal Processing, vol. ASSP-25, no. 3, pp. 235–238, June 1977.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPOK-0031-0036