10.24425/aoa.2022.141648
Spoofed Speech Detection with Weighted Phase Features and Convolutional Networks
References
Alam M.J., Kenny P., Bhattacharya G., Stafylakis T. (2015), Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015, [in:] Interspeech 2015, pp. 2072–2076, Dresden, Germany.
Alzantot M., Wang Z., Srivastava M.B. (2019), Deep residual neural networks for audio spoofing detection, [in:] Interspeech 2019, pp. 1078–1082, doi: 10.21437/Interspeech.2019-3174.
Białobrzeski R., Kosmider M., Matuszewski M., Plata M., Rakowski A. (2019), Robust Bayesian and light neural networks for voice spoofing detection, [in:] Interspeech 2019, pp. 1028–1032, doi: 10.21437/Interspeech.2019-2676.
Cai W., Wu H., Cai D., Li M. (2019), The DKU replay detection system for the ASVspoof 2019 challenge: on data augmentation, feature representation, classification, and fusion, [in:] Interspeech 2019, pp. 1023–1027, doi: 10.21437/Interspeech.2019-1230.
Chang S.-Y., Wu K.-C., Chen C.-P. (2019), Transfer-representation
learning for detecting spoofing attacks with converted and synthesized speech in automatic speaker verification system, [in:] Interspeech 2019, pp. 1063–1067, doi: 10.21437/Interspeech.2019-2014.
Chen Z., Zhang W., Xie Z., Xu X., Chen D. (2018), Recurrent neural networks for automatic replay spoofing attack detection, [in:] IEEE International Conference on Acoustics, Speech and Signal Processing – Proceedings (ICASSP), pp. 2052–2056, doi: 10.1109/ICASSP.2018.8462644.
Chettri B., Benetos E., Sturm B.L.T. (2020), Dataset Artefacts in anti-spoofing systems: a case study on the ASVspoof 2017 benchmark, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28: 3018–3028, doi: 10.1109/TASLP.2020.3036777.
Chettri B., Sturm B.L., Benetos E. (2018), Analysing replay spoofing countermeasure performance under varied conditions, 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, doi: 10.1109/MLSP.2018.8516968.
De Leon P.L., Pucher M., Yamagishi J., Hernaez I., Saratxaga I. (2012), Evaluation of speaker verification security and detection of HMM-Based synthetic speech, IEEE Transactions on Audio, Speech, and Language Processing, 20(8): 2280–2290, doi: 10.1109/TASL.2012.2201472.
Dehak N., Kenny P.J., Dehak R., Dumouchel P., Ouellet P. (2011), Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech, and Language Processing, 19(4): 788–798, doi: 10.1109/TASL.2010.2064307.
Delgado H. et al. (2018), ASVspoof 2017 Version 2.0: Meta-data analysis and baseline enhancements, [in:] The Speaker and Language Recognition Workshop, pp. 296–303, doi: 10.21437/Odyssey.2018-42.
Dinkel H., Qian Y., Yu K. (2017), Small-footprint convolutional neural network for spoofing detection, [in:] 2017 International Joint Conference on Neural Networks (IJCNN), pp. 3086–3091, doi: 10.1109/IJCNN.2017.7966240.
Font R., Espín J.M., Cano M.J. (2017), Experimental analysis of features for replay attack detection-Results on the ASVspoof 2017 Challenge, [in:] Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, pp. 7–11, doi: 10.21437/Interspeech.2017-450.
Gomez-Alanis A., Peinado A.M., Gonzalez J.A., Gomez A.M. (2019), A light convolutional GRU-RNN deep feature extractor for ASV spoofing detection, Interspeech 2019, pp. 1068–1072, doi: 10.21437/Interspeech.2019-2212.
González Hautamäki R., Kinnunen T., Hautamäki V., Laukkanen A.-M. (2015), Automatic versus human speaker verification: the case of voice mimicry, Speech Communication, 72: 13–31, doi: 10.1016/j.specom.2015.05.002.
Hanilçi C. (2018a), Features and classifiers for replay spoofing attack detection, [in:] 2017 10th International Conference on Electrical and Electronics Engineering, ELECO 2017, pp. 1187–1191, Bursa, Turkey.
Hanilçi C. (2018b), Linear prediction residual features for automatic speaker verification anti-spoofing, Multimedia Tools and Applications, 77(13): 16099–16111, doi: 10.1007/s11042-017-5181-0.
Hanilçi C., Kinnunen T., Sahidullah M., Sizov A. (2015), Classifiers for synthetic speech detection: a comparison, [in:] Interspeech 2015, pp. 2057–2061, Dresden, Germany.
Hanilçi C., Kinnunen T., Sahidullah M., Sizov A. (2016), Spoofing detection goes noisy: An analysis of synthetic speech detection in the presence of additive noise, Speech Communication, 85: 83–97, doi: 10.1016/j.specom.2016.10.002.
Jung J., Shim H., Heo H.-S., Yu H.-J. (2019), Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 challenge, [in:] Interspeech 2019, pp. 1083–1087, doi: 10.21437/Interspeech.2019-1991.
Kinnunen T. (2017), The ASVspoof 2017 Challenge: Assessing the limits of replay spoofing attack detection, [in:] Interspeech 2017, pp. 1–5, Stockholm, Sweden.
Liu M., Wang L., Dang J., Nakagawa S., Guan H., Li X. (2019), Replay attack detection using magnitude and phase information with attention-based adaptive filters, [in:] ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6201–6205, doi: 10.1109/ICASSP.2019.8682739.
Liu Y., Tian Y., He L., Liu J., Johnson M.T. (2015), Simultaneous utilization of spectral magnitude and phase information to extract supervectors for speaker verification anti-spoofing, [in:] Interspeech 2015, pp. 2082–2086, Dresden, Germany.
Novoselov S., Kozlov A., Lavrentyeva G., Simonchik K., Shchemelinin V. (2016), STC anti-spoofing systems for the ASVspoof 2015 challenge, [in:] 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5475–5479, doi: 10.1109/ICASSP.2016.7472724.
Patel T.B., Patil H.A. (2015), Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech, [in:] Interspeech 2015, pp. 2062–2066, Dresden, Germany.
Rafi B.S.M., Murty K.S.R. (2019), Importance of analytic phase of the speech signal for detecting replay attacks in automatic speaker verification systems, [in:] ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6306–6310, doi: 10.1109/ICASSP.2019.8683500.
Reynolds D.A., Quatieri T.F., Dunn R.B. (2000), Speaker verification using adapted Gaussian mixture models, Digital Signal Processing, 10(1–3): 19–41, doi: 10.1006/dspr.1999.0361.
Sahidullah M., Kinnunen T., Hanilçi C. (2015), A comparison of features for synthetic speech detection, [in:] Interspeech 2015, pp. 2087–2091, Dresden, Germany.
Singh M., Pati D. (2019), Combining evidences from Hilbert envelope and residual phase for detecting replay attacks, International Journal of Speech Technology, 22(2): 313–326, doi: 10.1007/s10772-019-09604-x.
Srinivas K., Das R.K., Patil H.A. (2018), Combining phase-based features for replay spoof detection system, [in:] 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 151–155, doi: 10.1109/ISCSLP.2018.8706672.
Tian X., Wu Z., Xiao X., Chng E.S., Li H. (2016), Spoofing detection from a feature representation perspective, [in:] 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2119–2123, doi: 10.1109/ICASSP.2016.7472051.
Todisco M., Delgado H., Evans N. (2017), Constant Q cepstral coefficients: a spoofing countermeasure for automatic speaker verification, Computer Speech & Language, 45: 516–535, doi: 10.1016/j.csl.2017.01.001.
Todisco M. et al. (2019), ASVspoof 2019: future horizons in spoofed and fake audio detection, [in:] Interspeech 2019, pp. 1008–1012, doi: 10.21437/Interspeech.2019-2249.
Tom F., Jain M., Dey P. (2018), End-to-end audio replay attack detection using deep convolutional networks with attention, [in:] Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, pp. 681–685, doi: 10.21437/Interspeech.2018-2279.
Wu Z., Chng E.S., Li H. (2012), Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition, [in:] 13th Annual Conference of the International Speech Communication Association 2012, Interspeech 2012, pp. 1698–1701, Portland, OR, USA.
Wu Z. et al. (2017), ASVspoof: The automatic speaker verification spoofing and countermeasures challenge, IEEE Journal of Selected Topics in Signal Processing, 11(4): 588–604, doi: 10.1109/JSTSP.2017.2671435.
Xiao X., Tian X., Du S., Xu H., Chng E.S., Li H. (2015), Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge, [in:] Interspeech 2015, pp. 2052–2056, Dresden, Germany.
Yang J., Das R.K. (2020), Long-term high frequency features for synthetic speech detection, Digital Signal Processing, 97(1): 1–11, doi: 10.1016/j.dsp.2019.102622.
Yang J., Das R.K., Li H. (2018), Extended constant-Q cepstral coefficients for detection of spoofing attacks, [in:] 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1024–1029, doi: 10.23919/APSIPA.2018.8659537.
Yang J., Liu L. (2018), Playback speech detection based on magnitude–phase spectrum, Electronics Letters, 54(14): 901–903, doi: 10.1049/el.2018.0739.
Yang J., Liu L., He Q. (2019), Discriminative feature based on FWMW for playback speech detection, Electronics Letters, 55(15): 861–864, doi: 10.1049/el.2019.1025.
Yang J., Xu L., Ren B., Ji Y. (2020), Discriminative features based on modified log magnitude spectrum for playback speech detection, EURASIP Journal on Audio, Speech, and Music Processing, doi: 10.1186/s13636-020-00173-5.
Zeinali H. et al. (2019), Detecting spoofing attacks using VGG and SincNet: BUT-Omilia submission to ASVspoof 2019 challenge, [in:] Interspeech 2019, pp. 1073–1077, doi: 10.21437/Interspeech.2019-2892.
Zhang C., Yu C., Hansen J.H.L. (2017), An investigation of deep-learning frameworks for speaker verification antispoofing, IEEE Journal of Selected Topics in Signal Processing, 11(4): 684–694, doi: 10.1109/JSTSP.2016.2647199.
DOI: 10.24425/aoa.2022.141648