10.1515/aoa-2016-0011
An Effective Speaker Clustering Method using UBM and Ultra-Short Training Utterances
It is noteworthy that this improvement concerns all vowels, even though the clustering discussed in this report was based only on the phoneme a. This indicates a strong correlation between the articulation of different vowels, which is probably related to the size of the vocal tract.
References
Anderson T.W. (2003), An Introduction to Multivariate Statistical Analysis, 3rd ed., John Wiley & Sons Inc, New York.
Basseville M. (1989), Distance measures for signal processing and pattern recognition, Signal Processing, 18, 349–369.
Bishop C.M. (2006), Pattern Recognition and Machine Learning, Springer, New York.
Chu S.M., Tang H., Huand T.S. (2009a), Locality preserving speaker clustering, Proceedings of IEEE International Conference on Multimedia and Expo, pp. 494–497, Mexico.
Chu S.M., Tang H., Huang T.S. (2009b), Fisher-voice and semi-supervised speaker clustering, International Conference on Acoustics, Speech and Signal Processing, pp. 4089–4092, Taipei.
De La Torre A., Peinada A.M., Segura J.C., Perez-Cordoba J.L., Benitez M.C., Rubio A.J. (2005), Histogram equalization of speech representation for robust speech recognition, IEEE Transaction on Speech and Audio Processing, 13, 355–366.
Duda R., Hart P., Stork D. (2000), Pattern Classification, 2-nd ed., John Wiley & Sons Inc., New York.
Hazen T.J. (2000), A comparison of novel techniques for rapid speaker adaptation, Speech Communication, 31, 15–33.
He X., Niyogi P. (2003), Locality Preserving Projections, Advances in Neural Information Processing Systems, 16, Vancuver.
Iyer A.N., Ofoegbu U.O., Yantorno R.E., Smolinski B.Y. (2006), Blind Speaker Clustering, International Symposium on Intelligent Signal Processing and Communications Systems, pp. 343–346, Yonago.
Jassem W. (1973), Fundamentals of Acoustic Phonetics, [in Polish: Podstawy fonetyki akustycznej ], PWN, Warszawa.
Kosaka T., Sagayama S. (1994), Tree-structured speaker clustering for fast speaker adaptation, Procedings of International Conference on Acoustics, Speech and Signal Processing, pp. 245–248, Ostendorf.
Kuhn R., Junqua J.-C., Nguyen P., Niedzielski N. (2000), Rapid speaker adaptation in eigenvoice space, IEEE Transaction on Speech and Audio Processing, 8, 695–707.
Liu D., Kubala F. (2004), Online Speaker Clustering, Procedings of International Conference on Acoustics, Speech and Signal Processing, pp. 333–336, Quebec.
Lu Z., Hui Y.V., Lee A.H. (2003), Minimum Hellinger distance estimation for finite Poisson regression models and its applications, Biometrics, 59, 1016–1026.
Mehrabani M., Hansen J.H.L. (2013), Singing speaker clustering based on subspace learning in the GMM mean supervector space, Speech Communication, 55, 653–666.
Makowski R. (2011), Automatic speech recognition – selected problems, [in Polish: Automatyczne rozpoznawanie mowy – wybrane zagadnienia], Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław.
Makowski R., Hossa R. (2014), Automatic speech signal segmentation based on innovations adaptive filter, International Journal on Applied Mathematics and Computer Science, 24, 259–270.
Mrówka P., Makowski R. (2008), Normalization of speaker individual characteristics and compensation of linear transmission distortions in command recognition systems, Archives of Acoustics, 33, 221–242.
Naito M., Deng L., Sagisaka Y. (2002), Speaker clustering for speech recognition using vocal track parameters, Speech Communication, 36, 305–315.
Reynolds D.A., Rose R.C. (1995), Robust text-independent speaker identification using gaussian mixture speaker models, IEEE Transaction on Speech and Audio Processing, 3, 72–83.
Reynolds D.A., Quatieri T.F., Dunn R.B. (2000), Speaker verification using adaptive gaussian mixture models, Digital Signal Processing, 10, 19–41.
Stafylakis T., Katsouros V., Carayannis G. (2006), The segmental Bayesian Information Criterion and its applications to Speaker diarization, IEEE Selected Topics in Signal Processing, 4, 857–866.
Tang H., Chu S.M., Hasegawa-Johnson M., Huang T.S. (2012), Partially Supervised Speaker Clustering, IEEE Transaction on Pattern Analysis and Machine Intelligence, 34, 959–971.
Tranter S., Reynolds D. (2006), An overwiew of Autmatic Speaker Diarization Systems, IEEE Transaction Audio, Speech and Language Processing, 14, 1557–1565.
Tsai W-H., Cheng S-S., Wang H-M. (2007), Automatic Speaker Clustering Using a Voice Characteristic Reference Space and Maximum Purity Estimation, IEEE Transaction on Audio, Speech and Language Processing, 15, 1461–1474.
DOI: 10.1515/aoa-2016-0011