PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Voice Conversion Based on Hybrid SVR and GMM

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
A novel VC (voice conversion) method based on hybrid SVR (support vector regression) and GMM (Gaussian mixture model) is presented in the paper, the mapping abilities of SVR and GMM are exploited to map the spectral features of the source speaker to those of target ones. A new strategy of F0 transfor- mation is also presented, the F0s are modeled with spectral features in a joint GMM and predicted from the converted spectral features using the SVR method. Subjective and objective tests are carried out to evaluate the VC performance; experimental results show that the converted speech using the proposed method can obtain a better quality than that using the state-of-the-art GMM method. Meanwhile, a VC method based on non-parallel data is also proposed, the speaker-specific information is investigated us- ing the SVR method and preliminary subjective experiments demonstrate that the proposed method is feasible when a parallel corpus is not available.
Rocznik
Strony
143--149
Opis fizyczny
Bibliogr. 20 poz., tab., wykr.
Twórcy
autor
autor
autor
autor
  • Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education Southeast University Nanjing, 210096, P.R. China, pengsongseu@gmail.com
Bibliografia
  • 1. Abe M., Nakamura S., Shikano K., Kuwabara H. (1998), Voice conversion through vector quantization, Proceedings of the 1998 International Conference on Acoustics, Speech, and Signal Processing, pp. 655-658, New York.
  • 2. Chen Y., Chu M., Chang E., Liu J., Liu R. (2003), Voice conversion with smoothed GMM and MAP adaptation, Proceedings of Eurospeech 2003, pp. 2413-2416, Geneva.
  • 3. Desai S., Black A.W., Yegnanarayana B., Prahallad K. (2010), Spectral mapping using artificial neural networks for voice conversion, IEEE Transactions on Audio, Speech, and Language Processing, 18, 5, 954-964.
  • 4. En-Najjary T., Rosec O., Chonavel T. (2003), A new method for pitch prediction from spectral envelope and its application in voice conversion, Proceedings of Eurospeech 2003, pp. 1753-1756, Geneva.
  • 5. Erro D., Moreno A. (2007), Frame Alignment Method for Cross-lingual Voice Conversion, Proceedings of Interspeech 2007, pp. 1969-1972, Antwerp.
  • 6. Inanoglu Z. (2003), Transforming pitch in a voice conversion framework, Master Thesis, St. Edmund's College, University of Cambridge.
  • 7. Kain A., Macon M.W. (1998), Spectral voice conversion for text-to-speech synthesis, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 285-288, Seattle.
  • 8. Kominek J., Black A.W. (2004), The CMU Arctic speech databases, Proceedings of the 5th ISCA Speech Synthesis Workshop, pp. 223-224, Pittsburgh.
  • 9. Kawahara H., Masuda-Katsuse T., Cheveigne A. (1999), Restructuring speech representation using pitch adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds, Speech Communication, 27, 3, 187-207.
  • 10. Misra H., Ikbal S., Yegnanarayana B. (2003), Speaker-specific mapping for text-independent speaker recognition, Speech Communication, 39, 3-4, 301-310.
  • 11. Mizuno H., Abe M. (2005), Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt, Speech Communication, 16, 2, 153-164.
  • 12. Mouchtaris A., Spiegel J.V., Mueller P. (2004), Non-parallel training for voice conversion by maximum likelihood constrained adaptation, Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1-4, Montreal.
  • 13. Perez-Cruz F., Camps-Valls G., Soria-Olivas E., Perez-Ruixo J.J., Figueiras-Vidal A.R., Artes-Rodriguez A. (2002), Multi-dimensional function approximation and regression estimation, Proceedings of the International Conference on Artificial Neural Networks, pp. 757-762, Madrid.
  • 14. Perez-Cruz F., Navia-Vazquez A., AlarconDiana P., Artes-Rodriguez A. (2000), An IRWLS procedure for SVR, Proceedings of the 10th European Signal Processing Conference, pp. 725-728, Tampere.
  • 15. Shao X., Milner B. (2004), Pitch prediction from MFCC vectors for speech reconstruction, Proceedings of the 2004 International Conference on Acoustics, Speech, and Signal Processing, pp. 97-100, Montreal.
  • 16. Smits G.F., Jordan E.M. (2002), Improved SVM regression using mixtures of kernels, Proceeding of the 2002 International Joint Conference on Neural Networks, pp. 2785-2790, Honolulu.
  • 17. Stylianou Y., Cappe O., Moulines E. (1998), Continuous probabilistic transform for voice conversion, IEEE Transactions Speech and Audio Processing, 6, 2, 131-142.
  • 18. Toda T., Saruwatari H., Shikano K. (2001), Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum, Proceedings of the 2001 International Conference on Acoustics, Speech, and Signal Processing, pp. 841-944, Salt Lake City.
  • 19. Toda T., Black A.W., Tokuda K. (2005), Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter, Proceedings of the 2005 International Conference on Acoustics, Speech, and Signal Processing, pp. 9-12, Philadelphia.
  • 20. Ye H., Young S. (2006), Quality-enhanced voice morphing using maximum likelihood transformations, IEEE Transactions on Audio, Speech and Language Processing, 14, 4, 1301-1312
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS8-0022-0002
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.