Powiadomienia systemowe
- Sesja wygasła!
Tytuł artykułu
Autorzy
Identyfikatory
Warianty tytułu
Porównanie metod inicjalizaji algorytmu EM dla wieloskładnikowych heteroscedastycznych mieszanin rozkładów normalnych
Języki publikacji
Abstrakty
A basic approach to estimation of mixture model parameters is by using expectation maximization (EM) algorithm for maximizing the likelihood function. However, it is essential to provide the algorithm with proper initial conditions, as it is highly dependent on the first estimation (“guess”) of parameters of a mixture. This paper presents several different initial condition estimation methods, which may be used as a first step in the EM parameter estimation procedure. We present comparisons of different initialization methods for heteroscedastic, multi-component Gaussian mixtures.
Algorytm EM (ang. expectation-maximization) jest szeroko stosowanym rozwiązaniem problemu estymacji parametrów mieszanin rozkładów prawdopodobieństwa poprzez maksymalizację wiarygodności. Istotne znaczenie dla działania algorytmu mają parametry początkowe, stanowiące pierwsze przybliżenie badanej mieszaniny. Publikacja przybliża kilka metod wyznaczania warunku początkowego dla iteracji algorytmu EM oraz porównuje ich skuteczność dla przypadku heteroscedastycznych, wieloskładnikowych mieszanin rozkładów normalnych.
Czasopismo
Rocznik
Tom
Strony
49--73
Opis fizyczny
Bibliogr. 42 poz.
Twórcy
autor
- Silesian University of Technology,Institute of Electrical Engineering and Informatics, ul. Akademicka 10, 44-100 Gliwice, Poland
autor
- Silesian University of Technology, Institute of Informatics, Akademicka 16, 44-100 Gliwice, Poland
Bibliografia
- 1. Dempster A. P., Laird N. M., Rubin D. B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Statist. Soc., Ser. B, vol. 39, 1977, p. 1÷38.
- 2. McLachan G. J., Krishnan T.: The EM Algorithm and Extensions. Wiley, 1997.
- 3. McLachan G. J., Peel W.: Finite Mixture Distributions. Wiley, 2000.
- 4. Bohning D., Seidel W.: Recent developments in mixture models. Comput. Statist. Data Anal., 41 (2003), p. 349÷357.
- 5. Recent Developments in Mixture Model, Special Issue. Computational Statistics and Data Analysis Volume 41, Issues~3-4, 28 January 2003.
- 6. Hastie T., Tibshirani R., Friedman J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second edition, Springer Verlag, Berlin 2009.
- 7. Sociological Methods and Research, Special Issue. Volume 29 Issue 3, 2001.
- 8. Gyllenberg M., Koski T., Lund T.: Applying the EM-algorithm to Classification of Bacteria}. Proceedings of the International ICSC Congress on Intelligent Systems and Applications, 2000.
- 9. Mazzocchi, M.: Time patterns in UK demand for alcohol and tobacco: an application of the EM algorithm. Computational Statistics and~Data Analysis, 50 (9), 2006, p. 2191÷2205.
- 10. Gooya A., Biros G., Davatzikos C.: An EM Algorithm for Brain Tumor Image Registration: A Tumor Growth Modeling Based Approach. IEEE Computer Society Conference on~Computer Vision and~Pattern Recognition, 2010.
- 11. Miller M. I., Chen S. C., Kuefler D. A., Davignon D. A.: Maximum Likelihood and the EM Algorithm for 2D NMR Spectroscopy. Journal of Magnetic Resonance, Volume 104, Issue 3, 1993, p. 247÷257.
- 12. Dijkstra M, Roelofsen H, Vonk RJ, Jansen R. C.: Peak quantification in surfaceenhanced laser desorption/ionization by using mixture models. Proteomics 2006; 6(19):5106-16.
- 13. Noy K, Fasulo D.: Improved model-based, platform-independent feature extraction for mass spectrometry. Bioinformatics 2007; 23(19):2528-35.
- 14. Davis L.: Handbook of Genetic Algorithms. New York, Van Nostrand Reinhold, 1991.
- 15. McLachlan, G. J., Peel D., Basford K. E., Adams P.: The EMMIX software for the fitting of mixtures of normal and t-components. Journal of Statistical Software, 1999, 4(2).
- 16. Biernacki C., Celeux G., Govaert G., Langrognet F.: Model-based cluster and discriminant analysis with the MIXMOD software. Computational Statistics & Data Analysis, 2006, Volume 51, Issue 2, p. 587÷600.
- 17. Richardson S., Green P. J.: On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society, 1997, B 59, p. 731÷792.
- 18. Wu J. C. F.: On the Convergence Properties of the EM Algorithm. The Annals of Statistics, 1983, Vol. 11, No. 1, p. 95÷103
- 19. Ma J., Xu L.: Asymptotic convergence properties of the EM algorithm with respect to the overlap in the mixture, Neurocomputing 68, 2005, p. 105÷129.
- 20. Xu L., Jordan M.: On Convergence Properties of the EM Algorithm for Gaussian Mixtures. Neural Computation, vol. 8, 1996, p. 129÷151.
- 21. Kiefer J., Wolfowitz J.: Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters, Ann. Math. Statist. 1956, Volume 27, Number 4, p. 887÷906.
- 22. Peters B. C.,Walker H. F.: An iterative procedure for obtaining maximum likelihood estimators of the parameters for a mixture of normal distributions. SIAM Journal on Applied Mathematics 35, 1978, p. 362÷378.
- 23. Karlis D., Xekalaki E.: Choosing initial values for the EM algorithm for finite mixtures. Computational Statistics and Data Analysis~41, 2003, p. 577÷590.
- 24. Biernacki C., Celeux G., Govaert G.: Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis, 2003, vol. 41, p. 561÷575.
- 25. Biernacki C.: Initializing EM using the properties of its trajectories in Gaussian mixtures. Statistics and Computing, 2004, vol. 14, p. 267÷279.
- 26. Yang M. S., Lai C. Y., Lin C. Y.: A robust EM clustering algorithm for Gaussian mixture models, Pattern Recognition, vol. 45, 2012, p. 3950÷3961.
- 27. Fayyad U. M., Reina C., Bradley P. S.: Initialization of iterative refinement clustering algorithms. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), 1998, p. 194÷198.
- 28. Ishikawa Y., Nakano R.: Obtaining EM Initial Points by Using the Primitive Initial Point and Subsampling Strategy, Proceedings of International Joint Conference on Neural Networks, 2007, p. 1115÷1120.
- 29. Pereira J. R. G., Cabral C. R. B., Marques L. A., da Costa J. M. J.: An Empirical Comparison of EM Initialization Methods and Model Choice Criteria for Mixtures of Skew-Normal Distributions, Technical Report, 10.01.2012, (http://www.ime.unicamp.br-/sinape/sites/default/files/Pereira_Cabral_Marques_Costa_0.pdf).
- 30. Bessadok A., Hansen P., Rebai A.: EM algorithm and Variable Neighborhood Search for fitting Finite Mixture Model parameters, Proceedings of the International Multiconference on Computer Science and Information Technology, 2009, p. 725÷733.
- 31. Meila M., Heckerman D.: An Experimental Comparison of Several Clustering and Initialization Methods, Microsoft Research Technical report MSR-TR-98-06, UAI 1998 and Machine Learning Journal, 2000, vol. 42, p. 9÷42.
- 32. Maitra R.: Initializing partition-optimization algorithms. IEEE/ACM Transactions on Computational Biology and Bioinformatics 6, 2009, p. 144÷157.
- 33. Reddy C. K., Rajaratnam B.: Learning mixture models via component-wise parameter smoothing, Computational Statistics and Data Analysis, vol. 54, 2010, p. 732÷749.
- 34. Pernkopf F., Bouchaffra D.: Genetic-Based EM Algorithm for Learning Gaussian Mixture Models, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, 2005, p. 1344÷1348.
- 35. Li L., Ma J.: A BYY Split-and-Merge EM Algorithm for Gaussian Mixture Learning. F. Sun et al. (Eds.): ISNN 2008, Part I, LNCS 5263, 2008, p. 600÷609.
- 36. Yao W.: A profile likelihood method for normal mixture with unequal variance Journal of Statistical Planning and Inference, vol. 140, 2010, p. 2089÷2098.
- 37. Hathaway R. J.: A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Annals of Statistics vol. 13, 1985, p. 795÷800.
- 38. Hathaway R. J.: A constrained EM algorithm for univariate mixtures Journal of Statistical Computation and Simulation vol. 23, 1986, p. 211÷230.
- 39. Ingrassia S.: A likelihood-based constrained algorithm for multivariate normal mixture models. Statistical Methods & Applications, vol. 13, 2004, p. 151÷590.
- 40. Fisher W. D.: On Grouping for Maximum Homogeneity. Journal of the American Statistical Association, Vol. 53, No. 284. 1958, p. 789÷798.
- 41. Engelman L., Hartigan J. A.: Percentage points of a test for cluster. J Am Stat Assoc vol. 64, 1969, p. 1647÷1648.
- 42. Jensen R. E.: A Dynamic Programming Algorithm for Cluster Analysis, Operations Research, vol. 17, 1969, p. 1034÷1057.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-83dfc935-b7bf-41f9-b3bf-3c2f145a024c