The Effect of Voice over IP Transmission Degradations on MAP-EM-GMM Speaker Verification Performance

Maciejko, W.

doi:10.1515/aoa-2015-0042

Artykuł - szczegóły

Tytuł artykułu

The Effect of Voice over IP Transmission Degradations on MAP-EM-GMM Speaker Verification Performance

Autorzy

Maciejko W.

Treść / Zawartość

Pełne teksty:

Maciejko_The Effect of Voice over IP Transmission Degradations_3_2015.pdf

Pobierz

Identyfikatory

DOI

10.1515/aoa-2015-0042

Warianty tytułu

Języki publikacji

Abstrakty

Despite the growing importance of packet switching systems, there is still a shortage of thorough analyses of VoIP transmission effect on speech and speaker recognition performance. Voice over IP transmission systems use packet switching. There is no guarantee of delivery. The main disadvantage of VoIP is a packet loss which has a major impact on the performance experienced by the users of the network. There are several techniques to mask the effects of a packet loss, referred to as packet loss concealment. In this study, the effect of voice transmission over IP on automatic speaker verification system performance was investigated. The analyzed system was based on MAP-EM-GMM modelling methods. Four various speech codecs of H.323 standard were investigated with special emphasis placed on the packet loss phenomenon and various packet loss concealment techniques.

Słowa kluczowe

automatic speaker verification packet loss speech compression voice over IP

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Komitet Akustyki PAN
Polskie Towarzystwo Akustyczne

Czasopismo

Archives of Acoustics

Rocznik

2015

Tom

Vol. 40, No. 3

Strony

407--417

Opis fizyczny

Bibliogr. 29 poz., rys., tab., wykr.

Twórcy

autor

Maciejko W.

w.maciejko@abw.gov.pl

Forensic Bureau, Internal Security Agency, Rakowiecka 2A, 00-993 Warsaw, Poland

Bibliografia

1. Besacier L., Ariyaeeinia A.M., Mason J.S., Bonastre J.F., Mayorga P., Fredouille C., Meignier S., Siau J., Evans N.W.D., Auckenthaler R., Stapert R. (2004), Voice biometrics over the internet in the framework of COST action 275, EURASIP Journal on Applied Signal Processing 2004:4, 466–479, Hindawi Publishing Corporation.
2. Besacier L., Grassi S., Dufaux A., Ansorge M., Pellandini F. (2000), GSM speech coding and speaker recognition, ICASSP.
3. Byrne C., Foulkes P. (2004), The mobile phone effect on vowel formants, International Journal of Speech Language and the Law, 11, 1.
4. Davidson J., Peters J. (2000), Voice over IP fundamentals. A systematics approach to understanding the basics of Voice over IP, CISCO Press, Indianapolis.
5. Davis S.B., Mermelstein P. (1980), Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech and Signal Processing, 28, 4, 357–366.
6. Furui S. (1981), Cepstral analysis technique for automatic speaker verification, IEEE Transactions Acoustics, Speech, Signal Processing, ASSP, 29, 254–272.
7. Gilbert E.N. (1960), Capacity of a burst-noise channel, The Bell System Technical Journal, September.
8. IETF (2004), The Effect of Packet Loss on Voice Quality for TDM over Pseudowires, Internet Draft, October 20.
9. Jajszczyk A. (2009), Introduction to telecomunication, [in Polish: Wstep do telekomunikacji], Podręczniki akademickie WNT, Warszawa.
10. Jelassi S., Rubino G.A. (2011), A study of artificial speech quality assessor of VoIP calls subject to limited bursty packet losses, EURASIP Journal on Image and Video Processing, 2011:9.
11. Maciejko W. (2012), Biometric speaker recognition in forensic science, [in Polish: Biometryczne rozpoznawanie mówców w kryminalistyce], Problemy Kryminalistyki 275, Warszawa.
12. Maciejko W. (2014), Impact of telephone transmission VoIP on forensic automatic speaker identification system based on EM-UBM-MAP algorithms, [in Polish: Wpływ transmisji głosu z wykorzystaniem telefonii internetowej VoIP na skuteczność automatycznego systemu kryminalistycznej identyfikacji mówców opartego na metodzie EM-UBM-MAP].
13. Margin-Chagnolleau I., Gravier G., Blouet R. (2001), Overview of the 2000-2001 ELISA consortium research activities, ISCA A speaker Odyssey The Speaker Recognition Workshop Crete.
14. Martin A., Doddington G., Kamm T., Ordowski M., Przybocki M. (1997), The DET curve in assessment of detection task performance, Proc. Eurospeech ’97, pp 1895–1898, Rhodes, Greece.
15. Mayorga P., Besacier L., Lamy R., Serignat J.-F. (2003), Audio packet loss over IP and speech recognition, Automatic Speech Recognition and Understanding, ASRU ’03.2003 IEEE Workshop on 30 Nov.-3 Dec., 607–612.
16. Mohamed S., Rubino G., Varela M. (2004), Performance evaluation of real-time speech through a packet network: a random networks-based approach, Performance evaluation. An international Journal, 57, 141–161.
17. Peinado A.M., Segura F.C. (2006), Speech recognition over digital channels. Robustness and Standards, John Wiley & Sons, Ltd.
18. Sanneck H. (2000), Packet loss recovery and control for voice transmission over the internet, Ph.D. Thesis Technischen Universit¨at Berlin, unpublished.
19. Staroniewicz P. (2006), Influence of specific VoIP transmission conditions on speaker recognition problem, Archives of Acoustics, 31, 4 (Supplement), 197–203.
20. Staroniczwicz P. (2007), Tests of robustness of GMM speaker verification in VoIP telephony, Archives of Acoustics, 32, 4 (Supplement), 187–192.
21. Recommendation ITU-T H.323 (2009), ITU-T H.323 Series H: Audiovisual and multimedia systems. Infrastructure of audiovisual services – Systems and terminal equipment for audiovisual services. Packetbased multimedia communications systems, International Telecommunication Union 12/2009.
22. Reynolds D.A., Quatieri T.F., Dunn R.B. (2000), Speaker verification using adapted gaussian mixture models, Digital Signal Processing, 10, 19–41.
23. Reynolds D.A. (1996), The effects of handset variability on speaker recognition performance: experiments on the switchboard corpus, Acoustics, Speech and Signal Processing, ICASSP-96, Conference Proceedings.
24. Reynolds D.A., Rose R.C. (1995), Robust textindependent speaker identification using gaussian mixture speaker models, IEEE Transactions on Speech and Audio Processing, 3, 1, 72–83.
25. Reynolds D.A., Zissman M.A., Quatieri T.F., O’Leary G.C., Carlson B.A. (1995), The effect of telephone transmission degradation on speaker recognition performance, Acoustics, Speech, and Signal Processing, 1995, ICASSP-95.
26. Rose P. (2002), Forensic speaker identification, Taylor & Francis, New York.
27. Transnexus, Inc. (2013), Four VoIP Trends to Watch for in 2013, Retrieved May 2-nd, 2013 from Transnexus, Inc. Newsletter Issue 5, January 2013, http://www.transnexus.com/index.php/issue-5-january-2013/four-voip-trends-to-watch-for-in-2013.
28. Viikki O., Laurila K. (1998), Cepstral domain segmental feature vector normalization for noise robust speech recognition, Speech Communication, 25, 133–147.
29. Young S., Evermann G., Gales M., Hain T., Kershaw D., Liu X., Moore G., Odell J., Ollason D, Povey D., Valtchev V., Woodland P. (2009), The HTK book v3.4, Cambridge.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-8743cb37-baaa-434a-80c6-13394688f716