PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
  • Sesja wygasła!
  • Sesja wygasła!
Tytuł artykułu

Incoherent Discriminative Dictionary Learning for Speech Enhancement

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Speech enhancement is one of the many challenging tasks in signal processing, especially in the case of nonstationary speech-like noise. In this paper a new incoherent discriminative dictionary learning algorithm is proposed to model both speech and noise, where the cost function accounts for both “source confusion” and “source distortion” errors, with a regularization term that penalizes the coherence between speech and noise sub-dictionaries. At the enhancement stage, we use sparse coding on the learnt dictionary to find an estimate for both clean speech and noise amplitude spectrum. In the final phase, the Wiener filter is used to refine the clean speech estimate. Experiments on the Noizeus dataset, using two objective speech enhancement measures: frequency-weighted segmental SNR and Perceptual Evaluation of Speech Quality (PESQ) demonstrate that the proposed algorithm outperforms other speech enhancement methods tested.
Rocznik
Tom
Strony
42--54
Opis fizyczny
Bibliogr. 43 poz., rys., tab.
Twórcy
autor
  • Telecommunications Department, Higher Institute for Applied Science and Technology, HIAST P.O. Box 31983 Damascus, Syria
autor
  • Telecommunications Department, Higher Institute for Applied Science and Technology, HIAST P.O. Box 31983 Damascus, Syria
autor
  • Telecommunications Department, Higher Institute for Applied Science and Technology, HIAST P.O. Box 31983 Damascus, Syria
Bibliografia
  • [1] S. Boll, „Suppression of acoustic noise in speech using spectra subtraction", IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 2, pp. 113-120, 1979 (doi: 10.1109/TASSP.1979.1163209).
  • [2] Y. Lu and P. C. Loizou, „A geometric approach to spectral subtraction", Speech Commun., vol. 50, no 6. pp. 453-466, 2008 (doi: 10.1016/j.specom.2008.01.003).
  • [3] J. S. Lim and A. V. Oppenheim, „Enhancement and bandwidth compression of noisy speech", Proc. of the IEEE, vol. 67, no. 12, pp. 1586-1604, 1979 (doi: 10.1109/PROC.1979.11540).
  • [4] Y. Ephraim, „Statistical-model-based speech enhancement systems", Proc. of the IEEE, vol. 80, no. 10, pp. 1526-1555, 1992 (doi: 10.1109./5.168664).
  • [5] Y. Hu and P. C. Loizou, „A generalized subspace approach for enhancing speech corrupted by colored noise", IEEE Trans. Speech Audio Process., vol. 11, no. 4, pp. 334-341, 2003 (doi: 10.1109/TSA.2003.814458).
  • [6] J. Sun, J. Zhang, and M. Small, „Extension of the local subspace method to enhancement of speech with colored noise", Signal Process., vol. 88, no. 7, pp. 1881-1888, 2008 (doi: 10.1016/j.sigpro.2008.01.008).
  • [7] T. Sreenivas and P. Kirnapure, „Codebook constrained Wiener Filtering for speech enhancement", IEEE Trans. Speech Audio Process., vol. 4, no. 5, pp. 383-389, 1996 (doi: 10.1109/89.536932).
  • [8] S. Srinivasan, J. Samuelsson, and W. B. Kleijn, „Codebook driver short term predictor parameter estimation for speech enhancement", IEEE Trans. Audio, Speech, and Language Process., vol. 14, no. 1, pp. 163-176, 2006 (doi: 10.1109/TSA.2005.854113).
  • [9] D. Y. Zhao and W. B. Kleijn, „HMM-based gain modeling for enhancement of speech in noise", IEEE Trans. Audio, Speech, and Language Process., vol. 15, no. 3, pp. 882-892, 2007 (doi: 10.1109/TASL.2006.885256).
  • [10] N. Mohammadiha, R. Martin, and A. Leijon, „Spectral domain speech enhancement using HMM state-dependent super-Gaussian priors", IEEE Signal Process. Lett., vol. 20, no. 3, pp. 253-256, 2013 (doi: 10.1109/LSP.2013.2242467).
  • [11] H. Veisi and H. Sameti, „Speech enhancement using hidden Markov models in Mel-frequency domain", Speech Commun., vol. 55, no. 2, pp. 205-220, 2013 (doi: 10.1016/j.specom.2012.2242467).
  • [12] K. W. Wilson, B. Raj, and P. Smaragdis, „Regularized non-negative matrix factorization with temporal dependencies for speech denoising", in Proc. of the 9th Ann. Conf. of the Int. Speech Commun. Association, Brisbane Interspeech 2008, Brisbane, Australia, 2008, pp. 411-414.
  • [13] M. Sun, Y. Li, J. Gemmeke, and X. Zhang, „Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence", IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 7, pp. 1233-1242, 2015 (doi: 10.1109.TASLP.2015.2427520).
  • [14] N. Mohammadiha, P. Smaragdis, and A. Leijon, „Supervised and unsupervised speech enhancement using nonnegative matrix factorization", IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2140-2151, 2013 (doi: 10.1109.TASL.2013.2270369).
  • [15] C. D. Sigg, T. Dikk, and J. M. Buhmann, „Speech enhancement using generative dictionary learning", IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 6, pp. 1698-1712, 2012 (doi: 10.1109.TASL.2012.2187194).
  • [16] Y. Zhao, X. Zhao, and B. Wang, „A speech enhancement metod based on sparse reconstruction of power spectral density", Computers & Elec. Engin., vol. 40, no. 4, 2014, pp. 1080-1089 (doi: 10.1016/j.compeleceng.2013.12.007).
  • [17] Y. Luo, G. Bao, Y. Xu, and Z. Ye, „Supervised monaural speech enhancement using complementary joint sparse representations", IEEE Signal Process. Lett., vol. 23, no. 2, pp. 237-241, 2016 (doi: 10.1109/LSP.2015.2509480).
  • [18] L. Zhang, G. Bao, J. Zhang, and Z. Ye, „Supervised single-channel speech enhancement using ratio mask with joint dictionary learning", Speech Commun., vol. 82, no. C, pp. 38-52, 2016 (doi: 10.1016/j.specom.2016.06.001).
  • [19] T. W. Shen and D. P K Lun, „A speech enhancement method based on sparse reconstruction on log-spectra", HKIE Trans., vol. 24, no. 1, pp. 24-34, 2017 (doi: 10.1080/1023697X.2016.1210545).
  • [20] M. Elad and M. Aharon, „Image denoising via sparse and redundant representations over learned dictionaries", IEEE Trans. Image Process., vol. 15, no. 12, pp. 3736-3745, 2006 (doi: 10.1109/TIP.2006.881969).
  • [21] M. Aharon, M. Elad, and A. Bruckstein, „K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation", IEEE Trans. on Signal Process., vol. 54, no. 11, pp. 4311-4322, 2006 (doi: 10.1109/TSP.2006.881199).
  • [22] R. Rubinstein, M. Zibulevsky, and M. Elad, „Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit", Technical Rep. CS-2008-08, Technion - Israel Institute of Technology, Haifa, Israel, 2008.
  • [23] K. Engan, S. O. Aase, and J. Hakon Husoy, „Method of optima directions for frame design", in Proc. IEEE Int. Conf. on Acoust., Speech, and Sig. Process. ICASSP'99, Phoenix, AZ, USA, 1999, vol. 5 (doi: 10.1109/ICASSP.1999.760624).
  • [24] Y. Suo, M. Dao, U. Srinivas, V. Monga, and T. D. Tran, „Structured Dictionary Learning for Classiffication", 2014, arXiv:1406.1943.
  • [25] J. Mairal, F. Bach, J. Ponce, G. Sapiro, A. Zisserman, „Supervised dictionary learning", in Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press, 2008, pp. 1033-1040.
  • [26] Q. Zhang and B. Li, „Discriminative K-SVD for dictionary learning in face recognition", in Proc. 23rd IEEE Conf. on Comp. Vision and Pattern Recogn. CVPR 2010, San Francisco, CA, USA, 2010, pp. 2691-2698 (doi: 10.1109/CVPR.2010.5539989).
  • [27] Z. Jiang, Z. Lin, and L. S. Davis, „Label consistent K-SVD: Learning a discriminative dictionary for recognition", IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11, pp. 2651-2664, 2013 (doi: 10.1109/TPAMI.2013.88).
  • [28] I. Ramirez, P. Sprechmann, and G. Sapiro, „Classiffication and clustering via dictionary learning with structured incoherence and shared features", in Proc. 23rd IEEE Conf. on Comp. Vision and Pattern Recogn. CVPR 2010, San Francisco, CA, USA, 2010, pp. 3501-3508 (doi: 10.1109/CVPR.2010.5539964).
  • [29] S. Kong and D. Wang, „A dictionary learning approach for classiffication: Separating the particularity and the commonality", in Proc. 12th Eur. Conf. on Comp. Vision ECCV 2012, Florence, Italy, 2012, pp. 186-199 (doi: 10.1007/978-3-642-33718-5 14).
  • [30] M. Yang, L. Zhang, and X. Feng, „Sparse representation based Fisher discrimination dictionary learning for image classiffication", Int. J. of Computer Vision, vol. 109, no. 3, pp. 209-232, 2014 (doi: 10.1007/s11263-014-0722-8).
  • [31] T. H. Vu and V. Monga, „Fast low-rank shared dictionary learning for image classiffication", in IEEE Trans. on Image Process., vol. 26, no. 11, pp. 5160-5175, 2017 (doi: 10.1109/TIP.2017.2729885).
  • [32] J. A. Tropp, „Greed is good: algorithmic results for sparse approximation", IEEE Trans. on Inform. Theory, vol. 50, no. 10, pp. 2231-2242, 2004 (doi: 10.1109/TIT.2004.834793).
  • [33] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, „Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition", in Proc. of 27th Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, USA, 1993 (doi: 10.1109/ACSSC.1993.342465).
  • [34] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, „Least angle regression", The Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004 (doi: 10.1214/009053604000000067).
  • [35] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, „Distributed optimization and statistical learning via the alternating direction method of multipliers", Found. and Trends in Machine Learn., vol. 3, no. 1, pp. 1-122, 2011 (doi: 10.1561/2200000016).
  • [36] A. Beck and M. Teboulle, „A fast iterative shrinkage-thresholding algorithm for linear inverse problems", SIAM J. on Imaging Sci., vol. 2, no. 1, pp. 183-202, 2009 (doi: 10.1137/080716542).
  • [37] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, „Online learning for matrix factorization and sparse coding", J. of Machine Learn. Res., vol. 11, pp. 19-60, 2010, arXiv:0908.0050.
  • [38] „Noizeus: A noisy speech corpus for evaluation of speech enhancement algorithms" [Online]. Available: http://ecs.utdallas.edu/loizou/speech/noizeus/
  • [39] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, „Perceptual evaluation of speech quality (PESQ) - A new method for speech quality assessment of telephone networks and codecs", in Proc. IEEE Int. Conf. on Acoust., Speech, and Sig. Process., Salt Lake City, UT, USA, 2001, pp. 749-752 (doi: 10.1109/ICASSP.2001.941023).
  • [40] P. C. Loizou, Speech Enhancement. Theory and Practice. Boca Raton, FL, USA: CRC, 2013 (ISBN: 9781138075573).
  • [41] Y. Hu and P. C. Loizou, „Evaluation of objective quality measures for speech enhancement", IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 1, pp. 229-38, 2008 (doi: 10.1109/TASL.2007.911054).
  • [42] J. Ma, Y. Hu, and P. C. Loizou, „Objective measures for predicting speech intelligibility in noisy conditions based on new band importance functions", J. Acoust. Soc. Am., vol. 125, no. 5, pp. 3387-3405, 2009.
  • [43] Noisex-92: Database of recording of various noises [Online]. Available: www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html
Uwagi
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-28c0c07a-28a4-47ab-8947-bc7b658efaf2
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.