PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Unsupervised Phoneme Segmentation Based on Main Energy Change for Arabic Speech

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this paper, a new method for segmenting speech at the phoneme level is presented. For this purpose, author uses the short-time Fourier transform of the speech signal. The goal is to identify the locations of main energy changes in frequency over time, which can be described as phoneme boundaries. A frequency range analysis and search for energy changes in individual area is applied to obtain further precision to identify speech segments that carry out vowel and consonant segment confined in small number of narrow spectral areas. This method merely utilizes the power spectrum of the signal for segmentation. There is no need for any adaptation of the parameters or training for different speakers in advance. In addition, no transcript information, neither any prior linguistic knowledge about the phonemes is needed, or voiced/unvoiced decision making is required. Segmentation results with proposed method have been compared with a manual segmentation, and compared with three same kinds of segmentation methods. These results show that 81% of the boundaries are successfully identified. This research aims to improve the acoustic parameters for all the processing systems of the Arab speech.
Rocznik
Tom
Strony
12--20
Opis fizyczny
Bibliogr. 35 poz., rys., tab.
Twórcy
autor
  • Department of Computer Science Faculty of Exact and Applied Sciences University of Oran 1 Ahmed Ben Bella Oran, Algeria
Bibliografia
  • [1] K. Vicsi and D. Sztahó, “Recognition of emotions on the basis of different levels of speech segments”, J. of Adv. Comput. Intell. and Intelligent Inform., vol. 16, no. 2, pp. 335–340, 2012.
  • [2] K. Vicsi, D. Sztahó, and G. Kiss, “Examination of the sensitivity of acoustic-phonetic parameters of speech to depression”, in Proc. 3rd IEEE Int. Conf. on Cognitive Infocommun. CogInfoCom 2012, Kosice, Slovakia, 2012, pp. 511–515 (doi: 10.1109/CogInfoCom.2012.6422035).
  • [3] K. Vicsi, V. Imre, and G. Kiss, “Improving the classification of healthy and pathological continuous speech”, in Proc. 15th Int. Conf. Text, Speech and Dialogue TSD 2012, Brno, Czech Republic, 2012, pp. 581–588.
  • [4] J. P. Goldman, “EasyAlign: An automatic phonetic alignment tool under Praat”, in Proc. 12th Ann. Conf. of the Int. Speech Commun. Assoc. Interspeech 2011, Florence, Italy, 2011.
  • [5] B. Bigi and D. Hirst, “Speech phonetization alignment and syllabication (SPPAS): A tool for the automatic analysis of speech prosody”, in Proc. 6th Int. Conf. Speech Prosody, Shanghai, China, 2012.
  • [6] S. Brognaux and T. Drugman, “HMM-based speech segmentation: Improvements of fully automatic approaches”, IEEE/ACM Trans. on Audio, Speech, and Lang. Process., vol. 24, no. 1, pp. 5–15, 2016.
  • [7] G. Gosztolya and L. Toth, “Detection of phoneme boundaries using spiking neurons”, in Proc. 9th Intell. Conf. on Artif. Intell. and Soft Comput. ICAISC 2008, Zakopane, Poland, 2008, pp. 782–793.
  • [8] E. C. Zsiga, The Sounds of Language: An Introduction to Phonetics and Phonology. Chichester, UK: Wiley, 2012.
  • [9] M. Malcangi, “Soft computing approach to segmentation of speech in phonetic units”, Int. J. of Computers and Commun., vol. 3, no. 3, pp. 41–48, 2009.
  • [10] G. Kiss, D. Sztahó, and K. Vicsi, “Language independent automatic speech segmentation into phoneme-like units on the base of acoustic distinctive features”, in Proc. 4th IEEE Int. Conf. on Cognitive Infocommun. CogInfoCom 2013, Budapest, Hungary, 2013, pp. 579–582.
  • [11] A. Stolcke et al., “Highly accurate phonetic segmentation using boundary correction models and system fusion”, in Proc. of IEEE Int. Conf. on Acoust., Speech and Signal Process. ICASSP 2014, Florence, Italy, 2014, pp. 5552–5556 (doi: 10.1109/ICASSP.2014.6854665).
  • [12] O. Scharenborg, V. Wan, and M. Ernestus, “Unsupervised speech segmentation: An analysis of the hypothesized phone boundaries”, J. of the Acoust. Soc. of America, vol. 127, no. 2, pp. 1084–1095, 2010 (doi: 10.1121/1.3277194).
  • [13] M. Sharma and R. Mammone, “Blind speech segmentation: Automatic segmentation of speech without linguistic knowledge”, in Proc. of Int. Conf. on Spoken Lang. Process. ICSLP 96, Philadelpia, USA, 1996, pp. 1237–1240.
  • [14] S. Dusan and L. Rabiner, “On the relation between maximum spectral transition positions and phone boundaries”, in Proc. 9th Int. Conf. on Spoken Lang. Process. INTERSPEECH 2006 – ICSLP, Pittsburgh, PA, USA, 2006, pp. 645–648.
  • [15] Y. A. Alotaibi and S. A. Selouani, “Evaluating the MSA West Point Speech Corpus”, Int. J. of Comp. Process. of Lang., vol. 22, no. 4, pp. 285–304, 2009.
  • [16] O. A. A. Ali, M. M. Moselhy, and A. Bzeih, “A comparative study of Arabic speech recognition”, in Proc. 16th IEEE Mediterranean in Electrotech. Conf. MELECON 2012, Hammamet, Tunisia, 2012.
  • [17] F. Biadsy, J. Hirschberg, and N. Habash, “Spoken Arabic dialect identification using phonotactic modeling”, in Proc. of Worksh. on Computat. Approaches to Semitic Lang., Athens, Greece, pp. 53–61, 2009.
  • [18] N. Hajj and M. Awad, “Weighted entropy cortical algorithms for isolated Arabic speech recognition”, in Proc. Int. Joint Conf. on Neural Netw. IJCNN 2013, Dallas, TX, USA, 2013 (doi: 10.1109/IJCNN.2013.6706753).
  • [19] J. F. Bonnot, “Experimentale de Certains aspects de la germination et de l’emphase en Arabe”, Travaux de l’Institut Phone´tique de Strasbourg, vol. 11, pp. 109–118, 1979 (in French).
  • [20] M. Alkhouli, “Alaswaat Alaghawaiyah”, Daar Alfalah, Jordan, 1990 (in Arabic).
  • [21] A. Biswas, P. K. Sahu, A. Bhowmick, and M. Chandra, “Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition”, J. of IET Sig. Process., vol. 9, no. 6, pp. 511–519, 2015.
  • [22] A. Nabil and M. Hesham, “Formant distortion after codecs for Arabic”, in Proceeding of the 4th Int. Symp. on Commun. Control and Sig. Process. ISCCSP 2010, Limassol, Cyprus, 2010, pp. 1–5.
  • [23] J. R. Deller Jr., J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signals. Wiley, 2000.
  • [24] Y. M. Seddiq and Y. A. Alotaibi, “Formant based analysis of vowels in Modern Standard Arabic – Preliminary results”, in Proc. 11th Int. Conf. on Inform. Sci., Sig. Process. and their Appl. ISSPA 2012, Montreal, QC, Canada, 2012, pp. 689–694.
  • [25] L. Besacier, J. Bonastre, and C. Fredouille, “Localization and selection of speaker-specific information with statistical modeling”, Speech Commun., vol. 31, pp. 89–106, 2000.
  • [26] S. Safavi, A. Hanani, M. Russell, P. Jancovic, and M. J. Carey, “Contrasting the ffects of dfferent frequency bands on speaker and accent identification”, in Proc. of IEEE Sig. Process. Lett., vol. 19, no. 12, pp. 829–832, 2012.
  • [27] A. M. Selmini and F. Violaro, “Acoustic-phonetic features for refining the explicit speech segmentation”, in Proc. 8th Ann. Conf. Int. Speech Commun. Assoc. INTERSPEECH 2007, Antwerp, Belgium, 2007, pp. 1314–1317.
  • [28] A. M. Selmini and F. Violaro, “Improving the Explicit automatic speech segmentation provided by HMMs”, in Proc. of the Int. Worksh. on Telecommun. IWT 2007, Santa Rita do Sapuca´ı, Brazil, 2007, pp. 220–226.
  • [29] M. A. Ben Messaoud, A. Bouzid, and N. Ellouze, “Automatic segmentation of the clean speech signal”, Int. J. of Elec., Comp., Energe., Electron. & Commun. Engin., vol. 9, no. 1, pp. 114–117, 2015.
  • [30] A. Juneja, “Speech recognition based on phonetic features and acoustic landmarks”, PhD Thesis, University of Maryland, College Park, USA, 2004.
  • [31] S. B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, Proc. IEEE Trans. Acoust. Speech & Sig. Process., vol. 28, no. 4, pp. 357–366, 1980.
  • [32] O. J. Rasanen, U. K. Laine, and T. Altosaar, “An improved speech segmentation quality measure: the R-value”, in Proc. 10th Ann. Conf. of the Int. Speech Commun. Assoc. INTERSPEECH 2009, Brighton, UK, 2009, pp. 1851–1854.
  • [33] S. Potisuk, “A novel method for blind segmentation of Thai continuous speech”, in Proc. of IEEE Sig. Process. & Signal Process. Edu. Worksh. SP/SPE 2015, Snowbird, UT, USA, 2015, pp. 415–420.
  • [34] S. King and M. Hasegawa-Johnson, “Accurate speech segmentation by mimicking human auditory processing”, in Proc. of IEEE Int. Conf. on Acoust., Speech & Sig. Process. ICASSP 2013, Vancouver, BC, Canada, 2013, pp. 8096–8100.
  • [35] D.-T. Hoang and H.-C. Wang, “A phone segmentation method and its evaluation on mandarin speech corpus”, in Proc. of 8th Int. Symp. on Chinese Spoken Lang. Process. ISCSLP 2012, Hong Kong, China, 2012, pp. 373–377.
Uwagi
Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-f813faed-fad0-4291-8f2f-28011d4b506a
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.