Tytuł artykułu
Treść / Zawartość
Pełne teksty:
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
The speech signal can be described by three key elements: the excitation signal, the impulse response of the vocal tract, and a system that represents the impact of speech production through human lips. The primary carrier of semantic content in speech is primarily influenced by the characteristics of the vocal tract. Nonetheless, when it comes to parameterization coefficients, the irregular periodicity of the glottal excitation is a significant factor that leads to notable variations in the values of the feature vectors, resulting in disruptions in the amplitude spectrum with the appearance of ripples. In this study, a method is suggested to mitigate this phenomenon. To achieve this goal, inverse filtering was used to estimate the excitation and transfer functions of the vocal tract. Subsequently, using the derived parameterisation coefficients, statistical models for individual Polish phonemes were established as mixtures of Gaussian distributions. The impact of these corrections on the classification accuracy of Polish vowels was then investigated. The proposed modification of the parameterisation method fulfils the expectations, the scatter of feature vector values was reduced.
Rocznik
Tom
Strony
569--574
Opis fizyczny
Bibliogr. 18 poz., rys.
Twórcy
autor
- Department of Acoustics, Multimedia and Signal Processing, Wroclaw University of Science and Technology, Wroclaw, Poland
autor
- Department of Acoustics, Multimedia and Signal Processing, Wroclaw University of Science and Technology, Wroclaw, Poland
autor
- Department of Acoustics, Multimedia and Signal Processing, Wroclaw University of Science and Technology, Wroclaw, Poland
Bibliografia
- [1] T. F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice, 1st ed. Upper Saddle River, NJ: Prentice Hall, Oct. 2001.
- [2] J. Walker and P. Murphy, A Review of Glottal Waveform Analysis, Jan. 2005, vol. 4391, pages: 21. [Online]. Available: https://doi.org/10.1007/978-3-540-71505-4 1.
- [3] T. Drugman, B. Bozkurt, and T. Dutoit, “A comparative study of glottal source estimation techniques,” Computer Speech Language, vol. 26, pp. 20–34, 01 2012. [Online]. Available: https://doi.org/10.1016/j.csl.2011.03.003.
- [4] D. Wong, J. Markel, and A. Gray, “Least squares glottal inverse filtering from the acoustic speech waveform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 4, pp. 350–355, Aug. 1979, conference Name: IEEE Transactions on Acoustics, Speech, and Signal Processing. [Online]. Available: https://doi.org/10.1109/TASSP.1979.1163260.
- [5] M. Plumpe, T. Quatieri, and D. Reynolds, “Modeling of the glottal flow derivative waveform with application to speaker identification,” IEEE Transactions on Speech and Audio Processing, vol. 7, no. 5, pp. 569–586, 1999. [Online]. Available: https://doi.org/10.1109/89.784109.
- [6] P. Alku, “Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering,” Speech Communication, vol. 11, no. 2, pp. 109–118, Jun. 1992. [Online]. Available: https://doi.org/10.1016/0167-6393(92)90005-R.
- [7] K. Syed and T. Qureshi, “A New Approach to Parametric Modeling of Glottal Flow,” Archives of Acoustics; 2011; vol. 36; No 4; 695-712, 2011, publisher: Committee on Acoustics PAS, PAS Institute of Fundamental Technological Research, Polish Acoustical Society.
- [8] J. Goldberger and H. Aronowitz, “A distance measure between gmms based on the unscented transform and its application to speaker recognition,” 09 2005, pp. 1985–1988. [Online]. Available: https://doi.org/10.21437/Interspeech.2005-624.
- [9] R. Makowski, Automatyczne rozpoznawanie mowy: wybrane zagadnienia. Oficyna Wydawnicza Politechniki Wrocławskiej, 2011. [Online]. Available: https://books.google.pl/books?id=qv5vMwEACAAJ.
- [10] H. Yin, V. Hohmann, and C. Nadeu, “Acoustic features for speech recognition based on gammatone filterbank and instantaneous frequency,” Speech Communication, vol. 53, no. 5, pp. 707–715, 2011, perceptual and Statistical Audition. [Online]. Available: https://doi.org/10.1016/j.specom.2010.04.008.
- [11] T.-W. Kuan, A.-C. Tsai, P.-H. Sung, J.-F. Wang, and H.-S. Kuo, “A robust bfcc feature extraction for asr system,” Artificial Intelligence Research, vol. 5, 01 2016. [Online]. Available: https://doi.org/10.5430/air.v5n2p14.
- [12] G. Sharma, K. Umapathy, and S. Krishnan, “Trends in audio signal feature extraction methods,” Applied Acoustics, vol. 158, p. 107020, 2020.
- [13] M. Skowronski and J. Harris, “Improving the filter bank of a classic speech feature extraction algorithm,” in Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS ’03., vol. 4, 2003, pp. IV–IV. [Online]. Available: https://doi.org/10.1109/ISCAS.2003.1205828.
- [14] M. Skowronski and J. Harris, “Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition,” The Journal of the Acoustical Society of America, vol. 116, pp. 1774–80, 10 2004. [Online]. Available: https://doi.org/10.1121/1.1777872.
- [15] T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku, “HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 153–165, Jan. 2011, conference Name: IEEE Transactions on Audio, Speech, and Language Processing. [Online]. Available: https://doi.org/10.1109/TASL.2010.2045239.
- [16] N. Henrich, C. d’Alessandro, B. Doval, and M. Castellengo, “On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation,” The Journal of the Acoustical Society of America, vol. 115, no. 3, pp. 1321–1332, 02 2004. [Online]. Available: https://doi.org/10.1121/1.1646401.
- [17] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the em algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 1–38, 1977. [Online]. Available: http://www.jstor.org/stable/2984875.
- [18] R. Hossa and R. Makowski, “An effective speaker clustering method using ubm and ultra-short training utterances,” Archives of Acoustics, vol. 41, 03 2016. [Online]. Available: https://doi.org/10.1515/aoa-2016-0011.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-2b3886db-81f6-42b5-b580-d97ec23e1161
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.