Basic parameters in speech processing. The need for evaluation

Höge, H.

Artykuł - szczegóły

Tytuł artykułu

Basic parameters in speech processing. The need for evaluation

Autorzy

Höge H.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

As basic parameters in speech processing we regard pitch, duration, intensity, voice quality, signal to noise ratio, voice activity detection and strength of Lombard effect. Taking in account also adverse conditions the performance of many published algorithms to extract those parameters from the speech signal automatically is not known. A framework based on competitive evaluation is proposed to push algorithmic research and to make progress comparable.

Słowa kluczowe

prosodic parameters VAD strength of Lombard effect evaluation

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Komitet Akustyki PAN
Polskie Towarzystwo Akustyczne

Czasopismo

Archives of Acoustics

Rocznik

2007

Tom

Vol. 32, No. 1

Strony

67--74

Opis fizyczny

Bibliogr. 19 poz., rys.

Twórcy

autor

Höge H.

Siemens AG, Corporate Technology Otto Hahn Ring 6, 81739 München, Germany, harald.hoege@siemens.com

Bibliografia

[1] HUNT A., BLACK A., Unit selection in a concatenative speech synthesis system using a large speech database, Proc. ICASSP 96, 373.376, 1996.
[2] DONOVAN R., ITTYCHERIAH A., FRANZ M., RAMABHADRAN B., EIDE E., VISWANATHAN M., BAKIS R., HAMZA W., PICHENY M., GLEASON P., RUTHERFOORD T., COX P., GREEN D., JANKE E., REVELIN S., WAAST C., ZELLER B., GUENTHER C., KUNZMANN J., Current status of the IBM trainable speech synthesis system, Proc. Fourth ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis (SSW-4), August 29 . September 1, Perthshire, Scotland 2001.
[3] SHRIBERG E., FERRER L., KAJAREKAR S., VENKATARAMAN A., STOLCKE A., Modeling prosodic feature sequences for speaker recognition, Speech Communication, 46, 3.4, 455.472 (2005).
[4] FERRER L., BRATT H., GADDE V. R., KAJAREKAR S., SHRIBERG E., SONMEZ K., STOLCKE A., VENKATARAMAN A., Modeling duration patterns for speaker recognition, Proc. Eurospeech, 2017. 2020, 2003.
[5] WILLETT D., GERL F., BRUECKNER R., Discriminatively trained context-dependent Duration. Bigram models for korean digit recognition, Proc. Int. Conference on Acoustics, Speech and Signal Processing, ASSP06, pp. I-25.I-28, 2006.
[6] GAROFOLO J. S., FISCUS J. G., FISHER W.M., Design and preparation of the 1996 Hub-4 broadcast news benchmark test corpora, Proc. DARPA Speech Recognition Workshop, February 1997.
[7] NIST 2005 Speaker Recognition Evaluation Plan, http://www.nist.gov/speech/tests/spk/2005/sre-05_evalplan-v6.pdf
[8] TC-Star _rst and second evaluation campaign, www.tc-star.org
[9] HIRSCH H.-G., PEARCE D., The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions, Proc. ISCA Tutorial and Research Workshop ASR 2000 . Automatic Speech Recognition: Challenges for the Next Millennium, Paris 2000.
[10] KOTNIK B., HÖGE H., KACIC Z., Evaluation of pitch detection algorithms in adverse conditions, Proc. Speech Prosody, 2006, Dresden 2006.
[11] HÖGE H., KOTNIK B., KACIC Z., PFITZINGER H. R., Evaluation of pitch marking algorithms, Proc. ITG-Fachtagung Sprachkommunikation. Kiel 2006.
[12] ISKRA D. J., GROSSKOPF B., MARASEK K., VAN DEN HEUVEL H., DIEHL F., KIESSLING A., SPEECON speech databases for consumer devices: Database specification and validation, Proc. Second Int. Conference on Language Resources and valuation (LREC'2002), pp. 329.333, Las Palmas 2002.
[13] ELDA catalogue No.: ELDA-S0218; includes construction and short description of the PMA/PDA Reference Database.
[14] PFITZINGER H. R., Local speech rate perception in German speech, Proc. of the XIV-th Int. Congress of Phonetic Sciences, Vol. 2, pp. 893.896, San Fransisco 1999.
[15] ADELL J., AGÜERO P. D., BONAFONTE A., Database pruning for unsupervised building of textto-speech voices, Proc. ICASSP, I-889 . I-892, 2006.
[16] MATEJKA P., SCHWARZ P., CERNOCKY J., CHYTIL P., Phonotactic language identification using high quality phoneme recognition, Proc. Eurospeech 2005, pp. 2237.2240, Lisbon 2005.
[17] ETSI EN 300 965 V8.0.1 (2000-11) speci_cation: Voice Activity Detector (VAD) for full rate speech traffic channels (GSM 06.32 version 8.0.1 Release 1999).
[18] ANDRASSY B., HÖGE H., Human and machine recognition as a function of SNR, Proc. LREC, 2006.
[19] BORIL H., POLLAK P., Design and collection of Czech lombard speech database, Proc. ISCA Interspeech, vol. 1, 1577.1580, Lisbon 2005.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BAT8-0003-0061