Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
In general, the speech signal can be described by the excitation signal, the impulse response of the vocal tract, and a system that describes the impact of speech emission through human lips. The characteristics of the vocal tract primarily shape the semantic content of speech. Regrettably, the irregular periodicity of glottal excitation represents a significant factor in generating substantial distortions (ripples) in the amplitude spectrum of voiced speech. In this study, a PS-STFT (Pitch-Synchronized Short-Time Fourier Transform) method was proposed to achieve a reliable amplitude spectrum of the vocal tract. Subsequently, a set of cepstral coefficient vectors, namely PS-HFCC (Pitch Synchronized Human Factor Cepstral Coefficients), as a chosen representative of the commonly used classical cepstral parameterization methods was analyzed to investigate the statistical properties after correction. Additionally, the widely accepted in speech recognition applications, the GMM (Gaussian Mixture Model) was chosen as the statistical acoustic model of individual Polish speech phonemes. To evaluate the quality of the proposed method, the distances between the multivariate probability distributions of the GMM form were calculated. Modifying classical cepstral methods through the analysis of variable-length signal frames synchronized to the fundamental period resulted in a reduction in the variance of the estimators of the cepstral coefficients, leading to an increase in the distances between the probability distributions and, consequently, improved classification results.
Rocznik
Tom
Strony
6
Opis fizyczny
Bibliogr. 28 poz., tab., rys.
Twórcy
autor
- Multimedia and Signal Processing, Wroclaw University of Science and Technology, Wroclaw, Poland
autor
- Multimedia and Signal Processing, Wroclaw University of Science and Technology, Wroclaw, Poland
Bibliografia
- [1] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357-366, 1980. [Online]. Available: https://doi.10.1109/TASSP.1980.1163420
- [2] M. Skowronski and J. Harris, “Improving the filter bank of a classic speech feature extraction algorithm,” in Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS ’03., vol. 4, 2003, pp. IV-IV. [Online]. Available: https://doi.org/10.1109/ISCAS.2003.1205828
- [3] T.-W. Kuan, A.-C. Tsai, P.-H. Sung, J.-F. Wang, and H.-S. Kuo, “A robust bfcc feature extraction for asr system,” Artificial Intelligence Research, vol. 5, 01 2016. [Online]. Available: https://doi.org/10.5430/air.v5n2p14
- [4] H. Yin, V. Hohmann, and C. Nadeu, “Acoustic features for speech recognition based on gammatone filterbank and instantaneous frequency,” Speech Communication, vol. 53, no. 5, pp. 707- 715, 2011, perceptual and Statistical Audition. [Online]. Available: https://doi.org/10.1016/j.specom.2010.04.008
- [5] N. Moritz, J. Anemu¨ller, and B. Kollmeier, “An auditory inspired amplitude modulation filter bank for robust feature extraction in automatic speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 11, pp. 1926-1937, 2015. [Online]. Available: https://doi.10.1109/TASLP.2015.2456420
- [6] L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, 1993.
- [7] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” Acoustical Society of America Journal, vol. 87, no. 4, pp. 1738-1752, 1990.
- [8] J. Koehler, N. Morgan, H. Hermansky, and H. G. Hirsch, “Integrating rasta-plp into speech recognition,” in Proceedings of ICASSP ’94. IEEE International Conference on Acoustics, Speech and Signal Processing, 1994, pp. I/421-I/424. [Online]. Available: https://doi.10.1109/ICASSP.1994.389266
- [9] H. Hermansky and P. Fousek, “Multi-resolution rasta filtering for tandem-based asr,” in Proc. ISCA Interspeech, Lisbon’05, 2005, p. 361-364.
- [10] M. Skowronski and J. Harris, “Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition,” The Journal of the Acoustical Society of America, vol. 116, pp. 1774-80, 10 2004. [Online]. Available: https://doi.org/10.1121/1.1777872
- [11] T. F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, 1st ed. Upper Saddle River, NJ: Prentice Hall, Oct. 2001.
- [12] B. Atal, “Automatic Speaker Recognition Based on Pitch Contours,” The Journal of the Acoustical Society of America, vol. 52, no. 6B, p. 1687-1697, 1972.
- [13] S. Gonzalez and M. Brookes, “A pitch estimation filter robust to high levels of noise (pefac),” in 19th European Signal Processing Conference, EUSIPCO Barcelona’11, 2011, p. 451-455.
- [14] D. J. Hermes, “Measurement of pitch by subharmonic summation,” The Journal of the Acoustical Society of America, vol. 83, no. 1, p. 257-264, 1988. [Online]. Available: https://doi.org/10.1121/1.396427
- [15] T. Drugman and A. Alwan, “Joint robust voicing detection and pitch estimation based on residual harmonics,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2011, 2011, p. 1973-1976.
- [16] A. Cheveigne and H. Kawahara, “YIN, a fundamental frequency estimator for speech and music,” The Journal of the Acoustical Society of America, vol. 111, no. 4, pp. 1917-1930, 2002. [Online]. Available: https://doi.org/10.1121/1.1458024
- [17] M. Mauch and S. Dixon, “Pyin: A fundamental frequency estimator using probabilistic threshold distributions,” in in Proc. ICASSP2014, 2014, pp. 659-663. [Online]. Available: https://doi.org/10.1109/ICASSP.2014.6853678
- [18] G. Sharma, K. Umapathy, and S. Krishnan, “Trends in audio signal feature extraction methods,” Applied Acoustics, vol. 158, pp. 1-21, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0003682X19308795
- [19] S. Gmyrek, R. Hossa, and R. Makowski, “Amplitude spectrum correction to improve speech signal classification quality,” International Journal of Electronics and Telecommunications, vol. 70, no. 3, p. 569-574, 2024. [Online]. Available: https://doi.10.24425/ijet.2024.14958
- [20] S. Gmyrek, R. Hossa, “Reducing the impact of fundamental frequency on the hfcc parameters of the speech signal,” in 2023 Signal Processing Symposium (SPSympo), 2023, pp. 49-52.
- [21] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 1-38, 1977. [Online]. Available: http://www.jstor.org/stable/2984875
- [22] S. Kullback, “Information theory and statistics,” Dover Publications, New York, 1968.
- [23] S. Julier and J. Uhlmann, “Unscented filtering and nonlinear estimation,” Proceedings of the IEEE, vol. 92, no. 3, pp. 401-422, 2004. [Online]. Available: https://doi.10.1109/JPROC.2003.823141
- [24] J. Goldberger and H. Aronowitz, “A distance measure between gmms based on the unscented transform and its application to speaker recognition,” 09 2005, pp. 1985-1988. [Online]. Available: https://doi.org/10.21437/Interspeech.2005-624
- [25] W. Jassem, Podstawy fonetyki akustycznej, ser. Biblioteka mechaniki stosowanej. Pan´stwowe Wydawn. Naukowe, 1973. [Online]. Available: https://books.google.pl/books?id=bCsSGQAACAAJ
- [26] C. Bishop, Pattern recognition and machine learning. New York: Springer 2006
- [27] R. Makowski, Automatic speech recognition - selected problems [in Polish: Automatyczne rozpoznawanie mowy - wybrane zagadnienia]. Oficyna Wydawnicza Politechniki Wroclawskiej, 2011.
- [28] S. Gmyrek, R. Hossa, and R. Makowski, “The Influence of the Amplitude Spectrum Correction in the HFCC Parametrization on the Quality of Speech Signal Frame Classification,” Archives of Acoustics, vol. 50, no. 1, p. 59-67, 2025. Available: https://doi.10.24425/aoa.2025.153652
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-cb18bbca-d260-4d2b-9a92-8babbaa02296
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.