Spoofed Speech Detection with Weighted Phase Features and Convolutional Networks

Gökay DİŞKEN

doi:10.24425/aoa.2022.141648

Authors

Gökay DİŞKEN Adana Alparslan Türkeş Science and Technology University, Turkey

Abstract

Detection of audio spoofing attacks has become vital for automatic speaker verification systems. Spoofing attacks can be obtained with several ways, such as speech synthesis, voice conversion, replay, and mimicry. Extracting discriminative features from speech data can improve the accuracy of detecting these attacks. In fact, a frame-wise weighted magnitude spectrum is found to be effective to detect replay attacks recently. In this work, discriminative features are obtained in a similar fashion (frame-wise weighting), however, a cosine normalized phase spectrum is used since phase-based features have shown decent performance for the given task. The extracted features are then fed to a convolutional neural network as input. In the experiments ASVspoof 2015 and 2017 databases are used to investigate the proposed system’s spoof detection performance for both synthetic and replay attacks, respectively. The results showed that the proposed approach achieved 34.5% relative decrease in the average EER for ASVspoof 2015 evaluation set, compared to the ordinary cosine normalized phase features. Furthermore, the proposed system outperformed the others at detecting S10 attack type of ASVspoof 2015 database.

Keywords:

spoofing detection, cosine normalized cepstrum, convolutional neural networks

References

1. Alam M.J., Kenny P., Bhattacharya G., Stafylakis T. (2015), Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015, [in:] Interspeech 2015, pp. 2072–2076, Dresden, Germany.

2. Alzantot M., Wang Z., Srivastava M.B. (2019), Deep residual neural networks for audio spoofing detection, [in:] Interspeech 2019, pp. 1078–1082, https://doi.org/10.21437/Interspeech.2019-3174

3. Białobrzeski R., Kosmider M., Matuszewski M., Plata M., Rakowski A. (2019), Robust Bayesian and light neural networks for voice spoofing detection, [in:] Interspeech 2019, pp. 1028–1032, https://doi.org/10.21437/Interspeech.2019-2676

4. Cai W., Wu H., Cai D., Li M. (2019), The DKU replay detection system for the ASVspoof 2019 challenge: on data augmentation, feature representation, classification, and fusion, [in:] Interspeech 2019, pp. 1023–1027, https://doi.org/10.21437/Interspeech.2019-1230

5. Chang S.-Y., Wu K.-C., Chen C.-P. (2019), Transfer-representation
learning for detecting spoofing attacks with converted and synthesized speech in automatic speaker verification system, [in:] Interspeech 2019, pp. 1063–1067, https://doi.org/10.21437/Interspeech.2019-2014

6. Chen Z., Zhang W., Xie Z., Xu X., Chen D. (2018), Recurrent neural networks for automatic replay spoofing attack detection, [in:] IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings (ICASSP), pp. 2052–2056, https://doi.org/10.1109/ICASSP.2018.8462644

7. Chettri B., Benetos E., Sturm B.L.T. (2020), Dataset Artefacts in anti-spoofing systems: a case study on the ASVspoof 2017 benchmark, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28: 3018–3028, https://doi.org/10.1109/TASLP.2020.3036777

8. Chettri B., Sturm B.L., Benetos E. (2018), Analysing replay spoofing countermeasure performance under varied conditions, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, https://doi.org/10.1109/MLSP.2018.8516968

9. De Leon P.L., Pucher M., Yamagishi J., Hernaez I., Saratxaga I. (2012), Evaluation of speaker verification security and detection of HMM-Based synthetic speech, IEEE Transactions on Audio, Speech, and Language Processing, 20(8): 2280–2290, https://doi.org/10.1109/TASL.2012.2201472

10. Dehak N., Kenny P.J., Dehak R., Dumouchel P., Ouellet P. (2011), Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, 19(4): 788–798, https://doi.org/10.1109/TASL.2010.2064307

11. Delgado H. et al. (2018), ASVspoof 2017 Version 2.0: Meta-data analysis and baseline enhancements, [in:] The Speaker and Language Recognition Workshop, pp. 296–303, https://doi.org/10.21437/Odyssey.2018-42

12. Dinkel H., Qian Y., Yu K. (2017), Small-footprint convolutional neural network for spoofing detection, [in:] 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3086–3091, https://doi.org/10.1109/IJCNN.2017.7966240

13. Font R., Espín J.M., Cano M.J. (2017), Experimental analysis of features for replay attack detection-Results on the ASVspoof 2017 Challenge, [in:] Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, pp. 7–11, https://doi.org/10.21437/Interspeech.2017-450

14. Gomez-Alanis A., Peinado A.M., Gonzalez J.A., Gomez A.M. (2019), A light convolutional GRU-RNN deep feature extractor for ASV spoofing detection, Interspeech 2019, pp. 1068–1072, https://doi.org/10.21437/Interspeech.2019-2212

15. González Hautamäki R., Kinnunen T., Hautamäki V., Laukkanen A.-M. (2015), Automatic versus human speaker verification: the case of voice mimicry, Speech Communication, 72: 13–31, https://doi.org/10.1016/j.specom.2015.05.002

16. Hanilçi C. (2018a), Features and classifiers for replay spoofing attack detection, [in:] 2017 10th International Conference on Electrical and Electronics Engineering, ELECO 2017, pp. 1187–1191, Bursa, Turkey.

17. Hanilçi C. (2018b), Linear prediction residual features for automatic speaker verification anti-spoofing, Multimedia Tools and Applications, 77(13): 16099–16111, https://doi.org/10.1007/s11042-017-5181-0

18. Hanilçi C., Kinnunen T., Sahidullah M., Sizov A. (2015), Classifiers for synthetic speech detection: a comparison, [in:] Interspeech 2015, pp. 2057–2061, Dresden, Germany.

19. Hanilçi C., Kinnunen T., Sahidullah M., Sizov A. (2016), Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise, Speech Communication, 85: 83–97, https://doi.org/10.1016/j.specom.2016.10.002

20. Jung J., Shim H., Heo H.-S., Yu H.-J. (2019), Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 challenge, [in:] Interspeech 2019, pp. 1083–1087, https://doi.org/10.21437/Interspeech.2019-1991

21. Kinnunen T. (2017), The ASVspoof 2017 Challenge: Assessing the limits of replay spoofing attack detection, [in:] Interspeech 2017, pp. 1–5, Stockholm, Sweden.

22. Liu M., Wang L., Dang J., Nakagawa S., Guan H., Li X. (2019), Replay attack detection using magnitude and phase information with attention-based adaptive filters, [in:] ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6201–6205, https://doi.org/10.1109/ICASSP.2019.8682739

23. Liu Y., Tian Y., He L., Liu J., Johnson M.T. (2015), Simultaneous utilization of spectral magnitude and phase information to extract supervectors for speaker verification anti-spoofing, [in:] Interspeech 2015, pp. 2082–2086, Dresden, Germany.

24. Novoselov S., Kozlov A., Lavrentyeva G., Simonchik K., Shchemelinin V. (2016), STC anti-spoofing systems for the ASVspoof 2015 challenge, [in:] 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5475–5479, https://doi.org/10.1109/ICASSP.2016.7472724

25. Patel T.B., Patil H.A. (2015), Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech, [in:] Interspeech 2015, pp. 2062–2066, Dresden, Germany.

26. Rafi B.S.M., Murty K.S.R. (2019), Importance of analytic phase of the speech signal for detecting replay attacks in automatic speaker verification systems, [in:] ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6306–6310, https://doi.org/10.1109/ICASSP.2019.8683500

27. Reynolds D.A., Quatieri T.F., Dunn R.B. (2000), Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, 10(1–3): 19–41, https://doi.org/10.1006/dspr.1999.0361

28. Sahidullah M., Kinnunen T., Hanilçi C. (2015), A comparison of features for synthetic speech detection, [in:] Interspeech 2015, pp. 2087–2091, Dresden, Germany.

29. Singh M., Pati D. (2019), Combining evidences from Hilbert envelope and residual phase for detecting replay attacks, International Journal of Speech Technology, 22(2): 313–326, https://doi.org/10.1007/s10772-019-09604-x

30. Srinivas K., Das R.K., Patil H.A. (2018), Combining phase-based features for replay spoof detection system, [in:] 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 151–155, https://doi.org/10.1109/ISCSLP.2018.8706672

31. Tian X., Wu Z., Xiao X., Chng E.S., Li H. (2016), Spoofing detection from a feature representation perspective, [in:] 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2119–2123, https://doi.org/10.1109/ICASSP.2016.7472051

32. Todisco M., Delgado H., Evans N. (2017), Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification, Computer Speech & Language, 45: 516–535, https://doi.org/10.1016/j.csl.2017.01.001

33. Todisco M. et al. (2019), ASVspoof 2019: future horizons in spoofed and fake audio detection, [in:] Interspeech 2019, pp. 1008–1012, https://doi.org/10.21437/Interspeech.2019-2249

34. Tom F., Jain M., Dey P. (2018), End-to-end audio replay attack detection using deep convolutional networks with attention, [in:] Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, pp. 681–685, https://doi.org/10.21437/Interspeech.2018-2279

35. Wu Z., Chng E.S., Li H. (2012), Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, [in:] 13th Annual Conference of the International Speech Communication Association 2012, Interspeech 2012, pp. 1698–1701, Portland, OR, USA.

36. Wu Z. et al. (2017), ASVspoof: The automatic speaker verification spoofing and countermeasures challenge, IEEE Journal of Selected Topics in Signal Processing, 11(4): 588–604, https://doi.org/10.1109/JSTSP.2017.2671435

37. Xiao X., Tian X., Du S., Xu H., Chng E.S., Li H. (2015), Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge, [in:] Interspeech 2015, pp. 2052–2056, Dresden, Germany.

38. Yang J., Das R.K. (2020), Long-term high frequency features for synthetic speech detection, Digital Signal Processing, 97(1): 1–11, https://doi.org/10.1016/j.dsp.2019.102622

39. Yang J., Das R.K., Li H. (2018), Extended constant-Q cepstral coefficients for detection of spoofing attacks, [in:] 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1024–1029, https://doi.org/10.23919/APSIPA.2018.8659537

40. Yang J., Liu L. (2018), Playback speech detection based on magnitude–phase spectrum, Electronics Letters, 54(14): 901–903, https://doi.org/10.1049/el.2018.0739

41. Yang J., Liu L., He Q. (2019), Discriminative feature based on FWMW for playback speech detection, Electronics Letters, 55(15): 861–864, https://doi.org/10.1049/el.2019.1025

42. Yang J., Xu L., Ren B., Ji Y. (2020), Discriminative features based on modified log magnitude spectrum for playback speech detection, EURASIP Journal on Audio, Speech, and Music Processing, https://doi.org/10.1186/s13636-020-00173-5

43. Zeinali H. et al. (2019), Detecting spoofing attacks using VGG and SincNet: BUT-Omilia submission to ASVspoof 2019 challenge, [in:] Interspeech 2019, pp. 1073–1077, https://doi.org/10.21437/Interspeech.2019-2892

44. Zhang C., Yu C., Hansen J.H.L. (2017), An investigation of deep-learning frameworks for speaker verification antispoofing, IEEE Journal of Selected Topics in Signal Processing, 11(4): 684–694, https://doi.org/10.1109/JSTSP.2016.2647199

Online first
Early birds
2026, Vol 51
	No 1	No 2
2025, Vol 50
	No 1	No 2	No 3	No 4
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

Spoofed Speech Detection with Weighted Phase Features and Convolutional Networks

Downloads

Authors

Abstract

Keywords:

References

Other articles by the same author(s)

cover

ippt-pan

Issue

Pages

Section

DOI

Received

Accepted

Published

License

How to Cite

Principal Contact

Address

Support Contact