Speaker Model Clustering to Construct Background Models for Speaker Verification

Gökay DİŞKEN; Zekeriya TÜFEKCİ; Ulus ÇEVİK

doi:10.1515/aoa-2017-0014

Authors

Gökay DİŞKEN Adana Science and Technology University, Turkey
Zekeriya TÜFEKCİ Çukurova Univesity, Turkey
Ulus ÇEVİK Çukurova Univesity, Turkey

Abstract

Conventional speaker recognition systems use the Universal Background Model (UBM) as an imposter for all speakers. In this paper, speaker models are clustered to obtain better imposter model represen- tations for speaker verification purpose. First, a UBM is trained, and speaker models are adapted from the UBM. Then, the $k$-means algorithm with the Euclidean distance measure is applied to the speaker models. The speakers are divided into two, three, four, and five clusters. The resulting cluster centers are used as background models of their respective speakers. Experiments showed that the proposed method consistently produced lower Equal Error Rates (EER) than the conventional UBM approach for 3, 10, and 30 seconds long test utterances, and also for channel mismatch conditions. The proposed method is also compared with the $i$-vector approach. The three-cluster model achieved the best performance with a 12.4% relative EER reduction in average, compared to the $i$-vector method. Statistical significance of the results are also given.

Keywords:

Gaussian mixture models, k-means, imposter models, speaker clustering, speaker verification

References

1. Apsingekar V.R., De Leon P.L. (2009), Speaker Model Clustering for Efficient Speaker Identification in Large Population Applications, IEEE Trans. Audio. Speech. Lang. Processing, 17, 848–853.

2. Auckenthaler R., Mason J.S. (2001), Gaussian selection applied to text-independent speaker verification, Proc. Speaker Odyssey: The Speaker Recognition Workshop, 83–88, Greece.

3. Beigi H.S.M., Maes S.H., Chaudhari U.V., Sorensen S. (1999), A hierarchical approach to largescale speaker recognition, European Conference on Speech Communication and Technology, 2203–2206, Hungary.

4. Bimbot F., Bonastre J.-F., Fredouille C., Gravier G., Magrin-Chagnolleau I., Meignier S., Merlin T., Ortega-Garcia J., PetrovskaDelacretaz D., Reynolds D.A. (2004), A Tutorial on Text-Independent Speaker Verification, EURASIP J. Adv. Signal Process., 2004, 430–451.

5. Brew A., Cunningham P. (2009), Combining Cohort and UBM Models in Open Set Speaker Identification, Seventh International Workshop on ContentBased Multimedia Indexing, 62–67, Crete.

6. Brew A., Cunningham P. (2010), Combining cohort and UBM models in open set speaker detection, Multimed. Tools Appl., 48, 141–159.

7. Campbell J.P. (1997), Speaker recognition: a tutorial, Proc. IEEE, 85, 1437–1462.

8. Campbell W.M., Sturim D.E., Reynolds D.A., Solomonoff A. (2006), SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation, IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, I-97-100, France.

9. De Leon P.L., Apsingekar V. (2007), Reducing Speaker Model Search Space in Speaker Identification, Biometrics Symposium, 1–6, USA.

10. Dehak N., Kenny P.J., Dehak R., Dumouchel P., Ouellet P. (2011), Front-End Factor Analysis for Speaker Verification, IEEE Trans. Audio. Speech. Lang. Processing, 19, 788–798.

11. Doddington G., Przybocki M., Martin A., Reynolds D. (2000), The NIST speaker recognition evaluation – Overview, methodology, systems, results, perspective, Speech Communication, 31, 225–254.

12. Gillick L., Cox S. (1989), Some statistical issues in the comparison of speech recognition algorithms, International Conference on Acoustics, Speech, and Signal Processing, 532–535.

13. Hossa R., Makowski R. (2016), An Effective Speaker Clustering Method using UBM and Ultra-Short Training Utterances, Archives of Acoustics, 41, 107–118.

14. Kenny P. (2005), Joint factor analysis of speaker and session variability: Theory and algorithms, CRIM, Montr. CRIM-06/08-13, 1–17.

15. Kenny P., Boulianne G., Ouellet P., Dumouchel P. (2007), Joint Factor Analysis Versus Eigenchannels in Speaker Recognition, IEEE Trans. Audio, Speech Lang. Process., 15, 1435–1447.

16. Kinnunen T., Li H. (2010), An overview of textindependent speaker recognition: From features to supervectors, Speech Communication, 52, 12–40.

17. McClanahan R.D., De Leon P.L. (2012), Mixture Component Clustering for Efficient Speaker Verification, Interspeech, 1086-1090, USA.

18. McClanahan R.D., De Leon P.L. (2015), Reducing computation in an i-vector speaker recognition system using a tree-structured universal background model, Speech Communication, 66, 36–46.

19. McLaren M., Vogt R., Baker B., Sridharan S. (2010), Data-Driven Background Dataset Selection for SVM-Based Speaker Verification, IEEE Trans. Audio. Speech. Lang. Processing, 18, 1496–1506.

20. Pallet D., FisherW., Fiscus J. (1990), Tools for the analysis of benchmark speech recognition, International Conference on Acoustics, Speech, and Signal Processing, 97–100.

21. Reynolds D.A. (1995), Speaker Identification and Verification using Gaussian mixture speaker odels, Speech Communication, 17, 91–108.

22. Reynolds D.A. (1997), Comparison of Background Normalization Methods for Text-Independent Speaker Verification, European Conference on Speech Communication and Technology, Greece.

23. Reynolds D.A., Quatieri T.F., Dunn R.B. (2000), Speaker Verification Using Adapted Gaussian Mixture Models, Digital Signal Processing, 10, 19–41.

24. Reynolds D.A., Rose R.C. (1995), Robust textindependent speaker identification using Gaussian mixture speaker models, IEEE Trans. Speech Audio Process., 3, 72–83.

25. Richardson F., Reynolds D., Dehak N. (2015), Deep Neural Network Approaches to Speaker and Language Recognition, IEEE Signal Processing Letters, 22, 1671–1675.

26. Sadjadi S.O., Slaney M., Heck L. (2013), MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research, Speech and Language Processing Technical Committee Newsletter, IEEE, 1–4.

27. Saeidi R., Kinnunen T., Mohammadi H.R.S., Rodman R., Franti P. (2010), Joint frame and Gaussian selection for text independent speaker verification, IEEE International Conference on Acoustics, Speech and Signal Processing, 4530–4533, USA.

28. Xiang B., Berger T. (2003), Efficient textindependent speaker verification with structural gaussian mixture models and neural network, IEEE Trans. Speech Audio Process., 11, 447–456.

29. Xiong Z., Zheng T.F., Song Z., Soong F., Wu W. (2006), A tree-based kernel selection approach to efficient Gaussian mixture model–universal background model based speaker identification, Speech Communication, 48, 1273–1282.

30. Zhu D., Ma B., Li H. (2011), Speaker Verification With Feature-Space MAPLR Parameters, IEEE Trans. Audio. Speech. Lang. Processing, 19, 505–515.

Online first
Early birds
2026, Vol 51
	No 1	No 2
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Speaker Model Clustering to Construct Background Models for Speaker Verification

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

License

How to Cite

Principal Contact

Address

Support Contact