Fitting a Gaussian mixture model through the Gini index

López-Lobato, Adriana Laura; Avendaño-Garrido, Martha Lorena

doi:10.34768/amcs-2021-0033

Artykuł - szczegóły

Tytuł artykułu

Fitting a Gaussian mixture model through the Gini index

Autorzy

López-Lobato Adriana Laura , Avendaño-Garrido Martha Lorena

Treść / Zawartość

Pełne teksty:

09_lopez_lobato_avendano_garrido_fitting_a_gaussain_model_2021_3.pdf

Pobierz

Identyfikatory

DOI

10.34768/amcs-2021-0033

Warianty tytułu

Języki publikacji

Abstrakty

A linear combination of Gaussian components is known as a Gaussian mixture model. It is widely used in data mining and pattern recognition. In this paper, we propose a method to estimate the parameters of the density function given by a Gaussian mixture model. Our proposal is based on the Gini index, a methodology to measure the inequality degree between two probability distributions, and consists in minimizing the Gini index between an empirical distribution for the data and a Gaussian mixture model. We will show several simulated examples and real data examples, observing some of the properties of the proposed method.

Słowa kluczowe

Gini index problem Gaussian mixture model clustering

indeks Giniego model mieszaniny Gaussa grupowanie

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2021

Tom

Vol. 31, no. 3

Strony

487--500

Opis fizyczny

Bibliogr. 22 poz., tab., wykr.

Twórcy

autor

López-Lobato Adriana Laura

adrilau17@gmail.com

Faculty of Mathematics, University of Veracruz, Circuito Gonzalo Aguirre Beltrán S/N, Zona Universitaria, Xalapa, Veracruz, Mexico

autor

Avendaño-Garrido Martha Lorena

Faculty of Mathematics, University of Veracruz, Circuito Gonzalo Aguirre Beltrán S/N, Zona Universitaria, Xalapa, Veracruz, Mexico

Bibliografia

[1] Bassetti, F., Bodini, A. and Regazzini, E. (2006). On minimum Kantorovich distance estimators, Statistics and Probability Letters 76(12): 1298–1302.
[2] Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer, New York.
[3] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B (Methodological) 39(1): 1–22.
[4] Elkan, C. (1997). Boosting and naive Bayesian learning, Proceedings of the International Conference on Knowledge Discovery and Data Mining, Newport Beach, USA.
[5] Flach, P.A. and Lachiche, N. (2004). Naive Bayesian classification of structured data, Machine Learning 57(3): 233–269.
[6] Giorgi, G.M. and Gigliarano, C. (2017). The Gini concentration index: A review of the inference literature, Journal of Economic Surveys 31(4): 1130–1148.
[7] Greenspan, H., Ruf, A. and Goldberger, J. (2006). Constrained Gaussian mixture model framework for automatic segmentation of MR brain images, IEEE Transactions on Medical Imaging 25(9): 1233–1245.
[8] Kłopotek, R., Kłopotek, M. and Wierzchoń, S. (2020). A feasible k-means kernel trick under non-Euclidean feature space, International Journal of Applied Mathematics and Computer Science 30(4): 703–715, DOI: 10.34768/amcs-2020-0052.
[9] Kulczycki, P. (2018). Kernel estimators for data analysis, in M. Ram and J.P. Davim (Eds), Advanced Mathematical Techniques in Engineering Sciences, CRC/Taylor & Francis, Boca Raton, pp. 177–202.
[10] López-Lobato, A.L. and Avendaño-Garrido, M.L. (2020). Using the Gini index for a Gaussian mixture model, in L. Martínez-Villaseñor et al. (Eds), Advances in Computational Intelligence. MICAI 2020, Lecture Notes in Computer Science, Vol. 12469, Springer, Cham, pp. 403–418.
[11] Mao, C., Lu, L. and Hu, B. (2020). Local probabilistic model for Bayesian classification: A generalized local classification model, Applied Soft Computing 93: 106379.
[12] Meng, X.-L. and Rubin, D.B. (1994). On the global and componentwise rates of convergence of the EM algorithm, Linear Algebra and its Applications 199(Supp. 1): 413–425.
[13] Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., Rastrow, A., Rose, R., Schwarz, P. and Thomas, S. (2011). The subspace Gaussian mixture model: A structured model for speech recognition, Computer Speech & Language 25(2): 404–439.
[14] Rachev, S., Klebanov, L., Stoyanov, S. and Fabozzi, F. (2013). The Methods of Distances in the Theory of Probability and Statistics, Springer, New York, pp. 659–663.
[15] Reynolds, D.A. (2009). Gaussian mixture models, in S.Z. Li (Ed.), Encyclopedia of Biometrics, Springer, New York, pp. 659–663.
[16] Rubner, Y., Tomasi, C. and Guibas, L.J. (2000). The Earth mover’s distance as a metric for image retrieval, International Journal of Computer Vision 40(2): 99–121.
[17] Singh, R., Pal, B.C. and Jabr, R.A. (2009). Statistical representation of distribution system loads using Gaussian mixture model, IEEE Transactions on Power Systems 25(1): 29–37.
[18] Torres-Carrasquillo, P.A., Reynolds, D.A. and Deller, J.R. (2002). Language identification using Gaussian mixture model tokenization, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, USA, pp. 1–757.
[19] Ultsch, A. and Lötsch, J. (2017). A data science based standardized Gini index as a Lorenz dominance preserving measure of the inequality of distributions, PloS One 12(8): e0181572.
[20] Vaida, F. (2005). Parameter convergence for EM and MM algorithms, Statistica Sinica 15(2005): 831–840.
[21] Villani, C. (2003). Topics in Optimal Transportation, American Mathematical Society, Providence.
[22] Xu, L. and Jordan, M.I. (1996). On convergence properties of the EM algorithm for Gaussian mixtures, Neural Computation 8(1): 129–151.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-aa116b7a-8158-401f-9164-d3be5927bde3