10.24425/aoa.2019.129724
Speech Enhancement Based on Constrained Low-rank Sparse Matrix Decomposition Integrated with Temporal Continuity Regularisation
References
Abdali S., NaserSharif B. (2017), Non-negative matrix factorization for speech/music separation using source dependent decomposition rank, temporal continuity term and filtering, Biomedical Signal Processing and Control, 36, 168–175, doi: 10.1016/j.bspc.2017.03.010.
Bando Y. et al. (2018), Speech enhancement based on Bayesian low-rank and sparse decomposition of multichannel magnitude spectrograms, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 2, 215–230, doi: 10.1109/TASLP.2017.2772340.
Boll S.F. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Audio, Speech, and Signal Processing, 27, 2, 113–120, doi: 10.1109/TASSP.1979.1163209.
Bouwmans T., Sobral A., Javed S., Jung S.K., Zahzah E.-H. (2017), Decomposition into low-rank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a large-scale dataset, Computer Science Review, 23, 1–71, doi: 10.1016/j.cosrev.2016.11.001.
Cai J.F., Candès E.J., Shen Z. (2010), A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 20, 4, 1956–1982, doi: 10.1137/080738970.
Candes E.J., Li X., Ma Y., Wright J. (2011), Robust principal component analysis? Journal of the ACM, 58, 3, 1–37, doi: 10.1145/1970392.1970395.
Candes E.J., Plan Y. (2010), Matrix completion with noise, Proceedings of the IEEE, 98, 6, 925–936, doi: 10.1109/JPROC.2009.2035722.
Cohen I. (2004), Speech enhancement using a noncausal a priori SNR estimator, IEEE Signal Processing Letters, 11, 9, 725–728, doi: 10.1109/LSP.2004.833478.
Ephraim Y., Van Trees H. (1995), A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, 3(4), 251–266, doi: 10.1109/89.397090.
Hermus K., Wambacq P., Hamme H.V. (2007), A review of signal subspace speech enhancement and its application to noise robust speech recognition, EURASIP Journal on Advances in Signal Processing, 1–15, doi: 10.1155/2007/45821.
Hu Y., Loizou P.C. (2003), A generalized subspace approach for enhancing speech corrupted by colored noise, IEEE Transactions on Audio, Speech and Language Processing, 11, 4, 334–342, doi: 10.1109/TSA.2003.814458.
Hu Y., Loizou P.C. (2008), Evaluation of objective quality measures for speech enhancement, IEEE Transactions on Audio, Speech and Language Processing, 16, 1, 229–230, doi: 10.1109/TASL.2007.911054.
Jin K.H., Ye J.C. (2018), Sparse and low-rank decomposition of a hankel structured matrix for impulse noise removal, IEEE Transactions on Image Processing, 27, 3, 1448–1461, doi: 10.1109/TIP.2017.2771471.
Kammi S., Mollaei M.R.K. (2017), Noisy speech enhancement with sparsity regularization, Speech Communication, 87, 58–69, doi: 10.1016/j.specom.2017.01.003.
Kheder W.B., Matrouf D., Bousquet P.-M., Bonastre J.-F., Ajili M. (2017), Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition, Computer Speech & Language, 45, 104–122, doi: 10.1016/j.csl.2016.12.007.
Kolbæk M., Tan Z.-H., Jensen J. (2017), Speech intelligibility potential of general and specialized deep neural network based speech enhancement systems, IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 25, 1, 153–167, doi: 10.1109/TASLP.2016.2628641.
Li X., Fan M., Liu L., Li W. (2018), Distributed-microphones based in-vehicle speech enhancement via sparse and low-rank spectrogram decomposition, Speech Communication, 98, 51–62, 10.1016/j.specom.2017.12.008.
Liu H., Peng J. (2018), Sparse signal recovery via alternating projection method, Signal Processing, 143, 161–170, doi: 10.1016/j.sigpro.2017.09.003.
Loizou P.C. (2007), Speech Enhancement: Theory and Practice, New York: Taylor & Francis.
Lu Y., Loizou P.C. (2008), A geometric approach to spectral subtraction, Speech Communication, 50, 6, 453–466, doi: 10.1016/j.specom.2008.01.003.
Mavaddaty S., Ahadi S. M., Seyedin S. (2016), A novel speech enhancement method by learnable sparse and low-rank decompositionand domain adaptation, Speech Communication, 76, 42–60, 10.1016/j.specom.2015.11.003.
Mohammadiha N., Arne L. (2013), Nonnegative HMM for babble noise derived from speech HMM: Application to speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 21, 5, 998–1011, doi: 10.1109/TASL.2013.2243435.
Moor, de B. (1993), The singular value decomposition and long and short spaces of noisy matrices, IEEE Transactions on Signal Processing, 41, 9, 2826–2839, doi: 10.1109/78.236505.
Paliwal K., Schwerin B., Wójcicki K. (2012), Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator, Speech Communication, 54, 2, 282–305, doi: 10.1016/j.specom.2011.09.003.
Paliwal K., Wójcicki K., Schwerin B. (2010), Single-channel speech enhancement using spectral subtraction in the short-time modulation domain, Speech Communication, 52, 5, 450–475, doi: /10.1016/j.specom.2010.02.004.
Plapous C., Marro C., Scalart P. (2006), Improved signal-to-noise ratio estimation for speech enhancement, IEEE Transactions on Acoustics, Speech, and Signal Processing, 14, 6, 2098–2108, doi: 10.1109/TASL.2006.872621.
Quatieri T. (2002), Discrete-time speech signal processing: principles and practice, Prentice Hall, Upper Saddle River, NJ.
Rugini L., Banelli P. (2016), On the equivalence of maximum SNR and MMSE estimation: applications to additive non-Gaussian channels and quantized observations, IEEE Transactions on Signal Processing, 64, 23, 6190–6199, doi: 10.1109/TSP.2016.2607152.
Scalart P., Vieira-Filho J. (1996), Speech enhancement based on a priori signal to noise estimation. Proceedings on 21st IEEE International Conference on Acoustics, Speech, and Signal Processing Conference, Atlanta, GA, doi: 10.1109/ICASSP.1996.543199.
Shannon B., Paliwal K. (2006), Role of phase estimation in speech enhancement, [in:] INTERSPEECH-2006, paper 1330-Tue3FoP.4,
https://www.isca-speech.org/archive/archive_papers/interspeech_2006/i06_1330.pdf.
Shi J., Song W. (2016), Sparse principal component analysis with measurement errors, Journal of Statistical Planning and Inference, 175, 87–99, doi: 10.1016/j.jspi.2016.03.001.
Stark A., Paliwal K. (2011), Use of speech presence uncertainty with MMSE spectral energy estimation for robust automatic speech recognition, Speech Communication, 53, 1, 51–61, 10.1016/j.specom.2010.08.001.
Sun C., Mu J. (2015), An eigenvalue filtering based subspace approach for speech enhancement, Noise Control Engineering Journal, 63, 1, 36–48, doi: 10.3397/1/376305.
Sun C., Xie J., Leng Y. (2016), A signal subspace speech enhancement approach based on joint low-rank and sparse matrix decomposition, Archives of Acoustics, 41, 2, 245–254, 10.1515/aoa-2016-0024.
Sun C., Zhu Q., Wan M. (2014), A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition, Speech Communication, 60, 44–55, doi: 10.1016/j.specom.2014.03.002.
Sun M., Li Y., Gemmeke J.F., Zhang X. (2015), Speech enhancement under low SNR conditions via noise estimation using sparse and low-rank NMF with Kullback-Leibler divergence, IEEE Transactions on Audio, Speech, and Language Processing, 23, 7, 1233–1242, doi: 10.1109/TASLP.2015.2427520.
Tan H., Cheng B., Feng J., Feng G., Wang W., Zhang Y.-J. (2013), Low-n-rank tensor recovery based on multi-linear augmented Lagrange multiplier method, Neurocomputing, 119, 144–152, doi: 10.1016/j.neucom.2012.03.039.
Virtanen T. (2007), Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Transactions on Audio, Speech, and Language Processing, 15, 3, 1066–1074, doi: 10.1109/TASL.2006.885253.
Wiener N. (1949), Extrapolation, interpolation, and smoothing of stationary time series, New York: Wiley.
Wright J., Ganesh A., Rao S., Peng Y., Ma Y. (2009), Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization, [in:] Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, A. Culotta (Eds), pp. 2080–2088,
http://papers.nips.cc/paper/3704-robust-principal-component-analysis-exact-recovery-of-corrupted-low-rank-matrices-via-convex-optimization.pdf.
Xu H., Caramanis C., Sanghavi S. (2012), Robust PCA via outlier pursuit, IEEE Transactions on Information Theory, 58, 5, 3047–3064, doi: 10.1109/TIT.2011.2173156.
Zhang Y., Zhao Y. (2013), Real and imaginary modulation spectral subtraction for speech enhancement, Speech Communication, 55, 4, 509–522, doi: 10.1016/j.specom.2012.09.005.
Zhen L., Peng D., Yi Z., Xiang Y., Chen P. (2017), Underdetermined blind source separation using sparse coding, IEEE Transactions on Neural Networks and Learning Systems, 28, 12, 3102–3108, 10.1109/TNNLS.2016.2610960.
DOI: 10.24425/aoa.2019.129724