Text to speech synthesis system with multi voice capability based on instantaneous voice conversion

Azarov, E.; Petrovsky, A.; Zubrycki, P.

Artykuł - szczegóły

Tytuł artykułu

Text to speech synthesis system with multi voice capability based on instantaneous voice conversion

Autorzy

Azarov E. , Petrovsky A. , Zubrycki P.

Identyfikatory

Warianty tytułu

System syntezy mowy z możliwością użycia wielu głosów oparty o metodę ciągłej konwersji

Języki publikacji

Abstrakty

The paper describes an approach to text-to-speech synthesis based on processing in harmonic domain. A special harmonic analysis technique is presented that provides accurate estimation of instantaneous harmonic parameters. The technique is based on narrow band filtering aligned to the fundamental frequency, which improves estimation accuracy of higher-order harmonics with rapid frequency changes. The advanced analysis ensures natural-sounding amplitude, pitch and phase matching because of the fine deterministic / stochastic separation. Speech synthesis is carried out using parametric representation that allows applying voice conversion techniques in order to get a multi voice synthesis system with a single voice acoustic database.

W artykule przedstawiono metodę syntezy mowy z tekstu wykorzystującą przetwarzanie sygnału w dziedzinie harmonicznych. Zaprezentowano technikę analizy składowych harmonicznych pozwalającą na precyzyjną estymację chwilowych wartości parametrów harmonicznych. Technika ta jest oparta o wąskopasmową filtrację synchronizowaną częstotliwością tonu podstawowego, co poprawia dokładność estymacji parametrów wyższych harmonicznych przy dużych zmianach częstotliwości. Zaawansowana metoda analizy zapewnia naturalne brzmienie oraz dopasowanie amplitud i faz harmonicznych ze względu na precyzyjną dekompozycję na komponenty deterministyczny i stochastyczny. Synteza sygnału mowy jest przeprowadzona przy użyciu reprezentacji parametrycznej, co pozwala na zastosowanie technik konwersji głosu. Dzięki temu opracowany system pozwala na syntezę z użyciem wielu głosów wykorzystując akustyczną bazę danych jednego mówcy.

Słowa kluczowe

text to speech synthesis speech conversion harmonic analysis

synteza mowy z tekstu konwersja mowy analiza harmoniczna

Wydawca

Wydawnictwo SIGMA-NOT

Czasopismo

Elektronika : konstrukcje, technologie, zastosowania

Rocznik

2011

Tom

Vol. 52, nr 5

Strony

111--116

Opis fizyczny

Bibliogr. 14 poz., wykr.

Twórcy

autor

Azarov E.

autor

Petrovsky A.

autor

Zubrycki P.

Belarusian State University of Informatics and Radioelectronics, Department of Computer Engineering, Minsk, Belarus

Bibliografia

[1] Moulines E. and Charpentier F.: Pitch Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis Using Diphones. Speech Communication, vol.9, No 5-6, pp. 453-467, 1990.
[2] Quatieri T. and McAulay R.: Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans, on ASSP, vol. 34(4), pp. 744-754, August 1986.
[3] Dutoit T.: An Introduction to Text-to-speech Synthesis. Kluwer Academic Publishers, the Netherlands, 285 p., 1997.
[4] Boashash B.: Estimating and Interpreting the Instantaneous Frequency of a Signal. Proceedings of the IEEE, vol. 80, 4, pp. 520-563, 1992.
[5] Maragos P. Kaiser J. F., Quatieri T. F.: Energy Separation in Signal Modulations with Application to Speech Analysis. IEEE Trans. on Signal Process., vol. 41. no. 10, pp. 3024-3051, 1993.
[6] Azarov E., Petrovsky A., Parfieniuk M.: Estimation of the instantaneous harmonic parameters of speech. in Proc. EUSIPCO-2008. Lausanne, August 2008.
[7] Zhang F., Bi G., Chen Y. Q.: Harmonic transform IEEE Proc.-Vis. Image Signal Process.,Vol. 151, No. 4, August 2004, pp. 257-264.
[8] Weruaga L., Kepesi M.: The fan-chirp transform for non-stationary harmonic signals. Signal Processing, Vol. 87, issue 6, June 2007, pp. 1-18.
[9] Gabor D.: Theory of communication. Proc. IEE, Vol.93, No. 3 1946, pp. 429-457.
[10] Azarov E. and Petrovsky A.: Instantaneous harmonic analysis for vocal processing. in Proc. DAFx-09, Como, Italy, September, 1-4, 2009.
[11] Levine S. and Smith J.: A Sines+Transients+Noise Audio Representation for Data Compression and Time/Pitch Scale Modifications. AES 105th Convention (San Francisco, CA, USA), Preprint 4781, September 1998.
[12] Stylianou Y., Cappe O. and Moulines E.: Continuous Probabilistic Transform for Voice Conversion. IEEE Trans. on Speech and Audio Processing., vol. 6, no. 2, pp. 131-142, 1998.
[13] Lobanov B. and Tsirulnik L.: Computer synthesis and cloning of speech. Minsk „Belarusian science” 2008. (In Russian).
[14] Azarov E. and Petrovsky A. : Text and speaker independent voice conversion. in Proc. Pattern recognition and Information processing, Minsk. Belarus, May 19-21, 2009, pp. 195-198.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BWAK-0024-0021