Archives of Acoustics, 42, 2, pp. 287–295, 2017

Frequency Selection Based Separation of Speech Signals with Reduced Computational Time Using Sparse NMF

Yash Vardhan VARSHNEY
Aligarh Muslim University

Zia Ahmad ABBASI
Aligarh Muslim University

Musiur Raza ABIDI
Aligarh Muslim University

Aligarh Muslim University

Application of wavelet decomposition is described to speed up the mixed speech signal separation with the help of non-negative matrix factorisation (NMF). It is assumed that the basis vectors of training data of individual speakers had been recorded. In this paper, the spectrogram magnitude of a mixed signal has been factorised with the help of NMF with consideration of sparseness of speech signals. The high frequency components of signal contain very small amount of signal energy. By rejecting the high frequency components, the size of input signal is reduced, which reduces the computational time of matrix factorisation. The signal of lower energy has been separated by using wavelet decomposition. The present work is done for wideband microphone speech signal and standard audio signal from digital video equipment. This shows an improvement in the separation capability using the proposed model as compared with an existing one in terms of correlation between separated and original signals. Obtained signal to distortion ratio (SDR) and signal to interference ratio (SIR) are also larger as compare of the existing model. The proposed model also shows a reduction in omputational time, which results in faster operation.
Keywords: sparse NMF; mixed speech recognition; machine learning
Full Text: PDF


Lee D.D., Seung H.S. (1999), Learning the pans of objects with nonnegative matrix factorization, Nature 401, 788–791.

Paatero P., Tapper U. (1994), Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, 5, 111–126.

Cho Y-C., Choi S., Bang S-Y. (2003), Non-negative component parts of sound for classification, 3rd IEEE International Symposium on Signal Processing and Information Technology, 633–636.

Benetos E., Kotti M., Kotropoulos C. (2006), Musical instrument classification using non-negative Matrix factorization algorithms and subset feature selection, IEEE International Conference on Acoustics, Speech and Signal Processing, 5, 221–224.

Demir C., Saraclar M., Cemgil A.T. (2013), Single-channel speech-music separation for robust ASR with mixture models, IEEE Transactions on Audio, Speech, and Language Processing, 21, 4, 725–736.

Schmidt M.N., Olsson R.K. (2006), Single-channel speech separation using sparse non-negative matrix factorization, 9th International Conference on Spoken Language Processing, Pittsburgh, PA, USA.

Hoyer P.O. (2004), Non-negative matrix factorization with sparseness constraint, Journal of Machine Learning Research, 1457–1469.

Wang Y., Li Y., Ho K.C., Zare A., Skubic M. (2014), Sparsity promoted non-negative matrix factorization for source separation and detection, 19th International Conference on Digital Signal Processing (DSP), 640–645.

Nasersharif B., Abdali S. (2015), Speech/music separation using non-negative matrix factorization with combination of cost functions, International Symposium on Artificial Intelligence and Signal Processing (AISP), 107–111.

Févotte C., Bertin N., Durrieu J. (2009), Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis, Neural Computation, 21, 793–830.

Lee D.D., Seung H.S. (2000), Algorithms for non-negative matrix factorization, Advances in Neural Information Processing Systems, 13, 556–562.

Zhu B., Li W., Li R., Xue X. (2013), Multi-stage non-negative matrix factorization for monaural singing voice separation, IEEE Transactions on Audio, Speech, and Language Processing, 21, 10, 2096–2107.

Wang Z., Sha F. (2014), Discriminative non-negative matrix factorization for single-channel speech separation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3749–3753, Florence, Italy, 4–9 May.

Jingu k., Haesun P. (2008), Sparse nonnegative matrix factorization for clustering, Georgia Institute of Technology, GT-CSE-08-01.

Upadhyaya P., Farooq O., Varshney P., Upadhyaya A. (2013), Enhancement of VSR using low dimension visual feature, International Conference of Multimedia, Signal Processing and Communication Technologies (IMPACT), Aligarh, India, pp. 71–74.

Walpole R.E., Myers R.H., Myers S.L., Ye K.E. (2016), Probability and Statistics for Engineers and Scientists, 9th ed., Pearson, ISBN: 978-0-3216-2911-1, p. 433.

Vincent E., Gribonval R., Fevotte C. (2006), Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech, and Language Processing, 14, 1462–1469.

Févotte C., Gribonval R., Vincent E. (2005), BSS_EVAL toolbox user guide revision 2.0, Tech. Rep. 1706, IRISA, Rennes, France.

Reetz H., Jongman A. (2011), Phonetics: transcription, production, acoustics, and perception, Wiley-Blackwell, ISBN: 978-1-4443-5854-4, pp. 182–200.

DOI: 10.1515/aoa-2017-0031

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)