Speech Enhancement Based on Constrained Low-rank Sparse Matrix Decomposition Integrated with Temporal Continuity Regularisation

Downloads

Authors

  • Chengli SUN Nanchang Hangkong University, China
  • Conglin YUAN Nanchang Hangkong University, China

Abstract

Speech enhancement in strong noise condition is a challenging problem. Low-rank and sparse matrix decomposition (LSMD) theory has been applied to speech enhancement recently and good performance was obtained. Existing LSMD algorithms consider each frame as an individual observation. However, real-world speeches usually have a temporal structure, and their acoustic characteristics vary slowly as a function of time. In this paper, we propose a temporal continuity constrained low-rank sparse matrix decomposition (TCCLSMD) based speech enhancement method. In this method, speech separation is formulated as a TCCLSMD problem and temporal continuity constraints are imposed in the LSMD process. We develop an alternative optimisation algorithm for noisy spectrogram decomposition. By means of TCCLSMD, the recovery speech spectrogram is more consistent with the structure of the clean speech spectrogram, and it can lead to more stable and reasonable results than the existing LSMD algorithm. Experiments with various types of noises show the proposed algorithm can achieve a better performance than traditional speech enhancement algorithms, in terms of yielding less residual noise and lower speech distortion.

Keywords:

speech enhancement, temporal continuity, low-rank and sparse decomposition

References

1. Abdali S., NaserSharif B. (2017), Non-negative matrix factorization for speech/music separation using source dependent decomposition rank, temporal continuity term and filtering, Biomedical Signal Processing and Control, 36, 168–175, https://doi.org/10.1016/j.bspc.2017.03.010

2. Bando Y. et al. (2018), Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 2, 215–230, https://doi.org/10.1109/TASLP.2017.2772340

3. Boll S.F. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Audio, Speech, and Signal Processing, 27, 2, 113–120, https://doi.org/10.1109/TASSP.1979.1163209

4. Bouwmans T., Sobral A., Javed S., Jung S.K., Zahzah E.-H. (2017), Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset, Computer Science Review, 23, 1–71, https://doi.org/10.1016/j.cosrev.2016.11.001

5. Cai J.F., Candès E.J., Shen Z. (2010), A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 20, 4, 1956–1982, https://doi.org/10.1137/080738970

6. Candes E.J., Li X., Ma Y., Wright J. (2011), Robust principal component analysis? Journal of the ACM, 58, 3, 1–37, https://doi.org/10.1145/1970392.1970395

7. Candes E.J., Plan Y. (2010), Matrix completion with noise, Proceedings of the IEEE, 98, 6, 925–936, https://doi.org/10.1109/JPROC.2009.2035722

8. Cohen I. (2004), Speech enhancement using a noncausal a priori SNR estimator, IEEE Signal Processing Letters, 11, 9, 725–728, https://doi.org/10.1109/LSP.2004.833478

9. Ephraim Y., Van Trees H. (1995), A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, 3(4), 251–266, https://doi.org/10.1109/89.397090

10. Hermus K., Wambacq P., Hamme H.V. (2007), A review of signal subspace speech enhancement and its application to noise robust speech recognition, EURASIP Journal on Advances in Signal Processing, 1–15, https://doi.org/10.1155/2007/45821

11. Hu Y., Loizou P.C. (2003), A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Transactions on Audio, Speech and Language Processing, 11, 4, 334–342, https://doi.org/10.1109/TSA.2003.814458

12. Hu Y., Loizou P.C. (2008), Evaluation of objective quality measures for speech enhancement, IEEE Transactions on Audio, Speech and Language Processing, 16, 1, 229–230, https://doi.org/10.1109/TASL.2007.911054

13. Jin K.H., Ye J.C. (2018), Sparse and low-rank decomposition of a hankel structured matrix for impulse noise removal, IEEE Transactions on Image Processing, 27, 3, 1448–1461, https://doi.org/10.1109/TIP.2017.2771471

14. Kammi S., Mollaei M.R.K. (2017), Noisy speech enhancement with sparsity regularization, Speech Communication, 87, 58–69, https://doi.org/10.1016/j.specom.2017.01.003

15. Kheder W.B., Matrouf D., Bousquet P.-M., Bonastre J.-F., Ajili M. (2017), Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition, Computer Speech & Language, 45, 104–122, https://doi.org/10.1016/j.csl.2016.12.007

16. Kolbæk M., Tan Z.-H., Jensen J. (2017), Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems, IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 25, 1, 153–167, https://doi.org/10.1109/TASLP.2016.2628641

17. Li X., Fan M., Liu L., Li W. (2018), Distributed-microphones based in-vehicle speech enhancement via sparse and low-rank spectrogram decomposition, Speech Communication, 98, 51–62, 10.1016/j.specom.2017.12.008.

18. Liu H., Peng J. (2018), Sparse signal recovery via alternating projection method, Signal Processing, 143, 161–170, https://doi.org/10.1016/j.sigpro.2017.09.003

19. Loizou P.C. (2007), Speech Enhancement: Theory and Practice, New York: Taylor & Francis.

20. Lu Y., Loizou P.C. (2008), A geometric approach to spectral subtraction, Speech Communication, 50, 6, 453–466, https://doi.org/10.1016/j.specom.2008.01.003

21. Mavaddaty S., Ahadi S. M., Seyedin S. (2016), A novel speech enhancement method by learnable sparse and low-rank decompositionand domain adaptation, Speech Communication, 76, 42–60, 10.1016/j.specom.2015.11.003.

22. Mohammadiha N., Arne L. (2013), Nonnegative HMM for babble noise derived from speech HMM: Application to speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 21, 5, 998–1011, https://doi.org/10.1109/TASL.2013.2243435

23. Moor, de B. (1993), The singular value decomposition and long and short spaces of noisy matrices, IEEE Transactions on Signal Processing, 41, 9, 2826–2839, https://doi.org/10.1109/78.236505

24. Paliwal K., Schwerin B., Wójcicki K. (2012), Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator, Speech Communication, 54, 2, 282–305, https://doi.org/10.1016/j.specom.2011.09.003

25. Paliwal K., Wójcicki K., Schwerin B. (2010), Single-channel speech enhancement using spectral subtraction in the short-time modulation domain, Speech Communication, 52, 5, 450–475, doi: /10.1016/j.specom.2010.02.004.

26. Plapous C., Marro C., Scalart P. (2006), Improved signal-to-noise ratio estimation for speech enhancement, IEEE Transactions on Acoustics, Speech, and Signal Processing, 14, 6, 2098–2108, https://doi.org/10.1109/TASL.2006.872621

27. Quatieri T. (2002), Discrete-time speech signal processing: principles and practice, Prentice Hall, Upper Saddle River, NJ.

28. Rugini L., Banelli P. (2016), On the equivalence of maximum SNR and MMSE estimation: applications to additive non-Gaussian channels and quantized observations, IEEE Transactions on Signal Processing, 64, 23, 6190–6199, https://doi.org/10.1109/TSP.2016.2607152

29. Scalart P., Vieira-Filho J. (1996), Speech enhancement based on a priori signal to noise estimation. Proceedings on 21st IEEE International Conference on Acoustics, Speech, and Signal Processing Conference, Atlanta, GA, https://doi.org/10.1109/ICASSP.1996.543199

30. Shannon B., Paliwal K. (2006), Role of phase estimation in speech enhancement, [in:] INTERSPEECH-2006, paper 1330-Tue3FoP.4,

https://www.isca-speech.org/archive/archive_papers/interspeech_2006/i06_1330.pdf

31. Shi J., Song W. (2016), Sparse principal component analysis with measurement errors, Journal of Statistical Planning and Inference, 175, 87–99, https://doi.org/10.1016/j.jspi.2016.03.001

32. Stark A., Paliwal K. (2011), Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition, Speech Communication, 53, 1, 51–61, 10.1016/j.specom.2010.08.001.

33. Sun C., Mu J. (2015), An eigenvalue filtering based subspace approach for speech enhancement, Noise Control Engineering Journal, 63, 1, 36–48, https://doi.org/10.3397/1/376305

34. Sun C., Xie J., Leng Y. (2016), A signal subspace speech enhancement approach based on joint low-rank and sparse matrix decomposition, Archives of Acoustics, 41, 2, 245–254, 10.1515/aoa-2016-0024.

35. Sun C., Zhu Q., Wan M. (2014), A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition, Speech Communication, 60, 44–55, https://doi.org/10.1016/j.specom.2014.03.002

36. Sun M., Li Y., Gemmeke J.F., Zhang X. (2015), Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence, IEEE Transactions on Audio, Speech, and Language Processing, 23, 7, 1233–1242, https://doi.org/10.1109/TASLP.2015.2427520

37. Tan H., Cheng B., Feng J., Feng G., Wang W., Zhang Y.-J. (2013), Low-n-rank tensor recovery based on multi-linear augmented Lagrange multiplier method, Neurocomputing, 119, 144–152, https://doi.org/10.1016/j.neucom.2012.03.039

38. Virtanen T. (2007), Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Transactions on Audio, Speech, and Language Processing, 15, 3, 1066–1074, https://doi.org/10.1109/TASL.2006.885253

39. Wiener N. (1949), Extrapolation, interpolation, and smoothing of stationary time series, New York: Wiley.

40. Wright J., Ganesh A., Rao S., Peng Y., Ma Y. (2009), Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization, [in:] Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, A. Culotta (Eds), pp. 2080–2088,

http://papers.nips.cc/paper/3704-robust-principal-component-analysis-exact-recovery-of-corrupted-low-rank-matrices-via-convex-optimization.pdf

41. Xu H., Caramanis C., Sanghavi S. (2012), Robust PCA via outlier pursuit, IEEE Transactions on Information Theory, 58, 5, 3047–3064, https://doi.org/10.1109/TIT.2011.2173156

42. Zhang Y., Zhao Y. (2013), Real and imaginary modulation spectral subtraction for speech enhancement, Speech Communication, 55, 4, 509–522, https://doi.org/10.1016/j.specom.2012.09.005

43. Zhen L., Peng D., Yi Z., Xiang Y., Chen P. (2017), Underdetermined blind source separation using sparse coding, IEEE Transactions on Neural Networks and Learning Systems, 28, 12, 3102–3108, 10.1109/TNNLS.2016.2610960.