PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Voice Conversion Using A Two-Factor Gaussian Process Latent Variable Model

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
PL
Dwuwskaźnikowa metoda GPLVM w procesie konwersji głosu
Języki publikacji
EN
Abstrakty
EN
This paper presents a novel strategy for voice conversion by solving style and content separation task using a two-factor Gaussian Process Latent Variable Model (GP-LVM). A generative model for speech is developed by interaction of style and content, which represent the voice individual characteristics and semantic information respectively. The interaction is captured by a GP-LVM with two latent variables, as well as a GP mapping to observation. Then, for a given collection of labelled observations, the separation task is accomplished by fitting the model with Maximum Likelihood method. Finally, voice conversion is implemented by style alternation, and the desired speech is reconstructed with the decomposed target speaker style and the source speech content using the learned model as a prior. Both objective and subjective test results show the advantage of the proposed method compared to the traditional GMM-based mapping system with limited size of training data. Furthermore, experimental results indicate that the GP-LVM with nonlinear kernel functions behaves better than that with linear ones for voice conversion due to its ability of better capturing the interaction between style and content, and rich varieties of the two factors in a training set also help to improve the conversion performance.
PL
W artykule opisano nową strategię konwersji głosu, poprzez rozdzielenie rodzaju i treści, przy wykorzystaniu dwu-wskaźnikowej metody GPLVM (ang. Gaussian Process Latent Variable Model). Wykonane badania wskazują na lepsze działanie proponowanego algorytmu w porównaniu z tradycyjnie stosowanym systemem mapowania typu GMM przy ograniczonej ilości danych do testowania. Wykazano, że GPLVM ma lepsze właściwości w konwersji głosu z nieliniową niż liniową funkcją jądra.
Rocznik
Strony
318--324
Opis fizyczny
Bibliogr. 26 poz., rys., tab.
Twórcy
autor
autor
autor
autor
autor
  • Institute of Communications Engineering, PLA Univ. of Sci. & Tech., Biaoyin 2, Yudao Street, Nanjing, China, 210007, sunxj99@hotmail.com
Bibliografia
  • [1] E. Moulines, Y. Sagisaka, Etc., Voice Conversion: State of the Art and Perspectives, Special Issue of Speech Communication, 16 (1995), No. 2, 125-224
  • [2] Daniel Erro, Asunción Moreno, and Antonio Bonafonte, Voice Conversion Based on Weighted Frequency Warping, IEEE Trans. on Speech and Audio Processing, 18(2010), No. 5, 922-931
  • [3] M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, Voice conversion through vector quantization, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 1988, 655-658
  • [4] K. Shikano, S. Nakamura, and M. Abe, Speaker adaptation and voice conversion by codebook mapping, in Proc. IEEE Int. Symp. Circuits Syst., 1991, vol. 1, 594-597
  • [5] L. M. Arslan, Speaker transformation algorithm using segmental codebooks (STASC), Speech Communication, 28 (1999), No. 28, 211-226
  • [6] O. Turk, L. Arslan. Robust processing techniques for voice conversion. Comput. Speech Lang., 4 (2006), No. 20, 441-467
  • [7] M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, Transformation of formants for voice conversion using artificial neural networks, Speech Communication, 16 (1995), No. 2, 207-216
  • [8] Srinivas Desai, Alan W. Black, B. Yegnanarayana, Kishore Prahallad. Spectral Mapping Using Artificial Neural Networks for Voice Conversion. IEEE Trans. on Audio, Speech, and Language Processing, 18 (2010), No. 5, 954-964
  • [9] Y. Stylianou, O. Cappé, and E. Moulines, Continuous Probabilistic Transform for Voice Conversion, IEEE Trans. on Speech and Audio Processing, 6 (1998), No. 2, 131-142
  • [10] A. Kain, High resolution voice transformation, Ph.D. dissertation, OGI School of Sci. and Eng., Beaverton, OR, 2001.
  • [11] Toda, T., A.W. Black and K. Tokuda, Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory. IEEE Trans. On Audio, Speech, and Language Processing, 15 (2007), No. 8, 2222-2235
  • [12] H. Valbret, E. Moulines, and J. P. Tubach, Voice transformation using PSOLA technique, Speech Communication, 11 (1992), No. 2-3, 145-148
  • [13] D. Rentzos, S. Vaseghi, Q. Yan, and C. H. Ho, Voice conversion through transformation of spectral and intonation features, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2004,vol.1, 21-24
  • [14] Z. W. Shuang, R. Bakis, S. Shechtman, D. Chazan, and Y. Qin, Frequency warping based on mapping formant parameters, in Proc. Of INTERSPEECH 2006, 2290-2293
  • [15] Helander, E., et al., Voice Conversion Using Partial Least Squares Regression. IEEE Trans. on Audio, Speech, and Language Processing, 18 (2010), No. 5, 912-921
  • [16] Helander, E.; Silen, H.; Virtanen, T.; Gabbouj, M.; Voice Conversion Using Dynamic Kernel Partial Least Squares Regression; IEEE Trans on Audio, Speech, and Language Processing, in print
  • [17] Song, P., et al., Voice conversion using support vector regression. Electronics Letters, 47 (2011), No.18, 1045-1046
  • [18] Joshua B. Tenenbaum, William T. Freeman. Separating Style and Content with Bilinear Models, Neural Computation, 2000, 12(6):1247-1283
  • [19] Victor Popa, Jani Nurminen, Moncef Gabbouj, A Novel Technique for Voice Conversion Based on Style and Content Decomposition with Bilinear Models, in Proc. Of INTERSPEECH 2009, 2655-2658
  • [20] Xu, N., et al., Voice conversion based on state-space model for modelling spectral trajectory. Electronics Letters, 45 (2009), No. 14, 763-764
  • [21] Neil Lawrence, Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models, Journal of Machine Learning Research, 6 (2005), 1783-1816
  • [22] Jack M. Wang, David J. Fleet, and Aaron Hertzmann, Multifactor Gaussian Process Models for Style-Content Separation, in Proceedings of the 24th International Conference on Machine Learning, Corvallis, 227 (2007), 975-982
  • 23] K. Grochow, S.L. Martin, A. Hertzmann, and Z. Popovic, Style-Based Inverse Kinematics, Proc. ACM SIGGRAPH, 23 (2004), No. 3, 522-531
  • [24] Jack M. Wang, David J. Fleet, and Aarion Hertzmann, Gaussian Process Dynamical Models for Human Motion, IEEE Trans. On Pattern Analysis and Machine Intelligence, 30 (2008), No.2, 283-297
  • [25] Daniel Erro, Asuncion Moreno, Antonio Bonafonte. Flexible Harmonic/Stochastic Speech Synthesis, in Proceedings of 6th ISCA Workshop on Speech Synthesis, 2007, 194-199
  • [26] Urtasun, R., Fleet, D. J., Hertzmann, A., and Fua, P. Priors for people tracking from small training sets. In Proc. of Inter. Conf. Comp. Vis. (ICCV), 2005, 403-410
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BPS1-0050-0101
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.