10.24425/aoa.2021.136559
Heart Rate Detection and Classification from Speech Spectral Features Using Machine Learning
References
Borkovec T.D, Wall R.L., Stone N.M. (1974), False Physiological Feedback and the Maintenance of Speech Anxiety, Journal of Abnormal Psychology, 83(2): 164–168.
Bühlmann P., Yu B. (2003), Boosting with the L2 loss, Journal of the American Statistical Association, 98(462): 324–339, doi: 10.1198/016214503000125.
Burton D.A., Stokes K., Hall G.M. (2004), Physiological effects of exercise, Continuing Education in Anaesthesia Critical Care & Pain, 4(6): 185–188, doi: 10.1093/bjaceaccp/mkh050.
Criminisi A., Shotton J., Konukoglu E. (2011), Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning, Foundations and Trends® in Computer Graphics and Vision, 7(2–3): 81–227, doi: 10.1561/0600000035.
Davis S., Mermelstein P. (1980), Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4): 357–366, doi: 10.1109/TASSP.1980.1163420.
Dreiseitl S., Ohno-Machado L. (2002), Logistic regression and artificial neural network classification models: a methodology review, Journal of Biomedical Informatics, 35(5–6): 352–359, doi: 10.1016/S1532-0464(03)00034-0.
Euler C. Von (1982), Some aspects of speech breathing physiology, [in:] Speech Motor Control. Proceedings of an International Symposium on Speech Motor Control, Grillner S., Lindblom B., Lubker J., Persson A. [Eds], Stockholm, May 11–12, 1981, pp. 95–103, doi: 10.1016/B978-0-08-028892-5.50013-X.
Hermansky H., Morgan N. (1994), RASTA Processing of Speech, IEEE Transactions on Speech and Audio Processing, 2(4): 578–589, doi: 10.1109/89.326616.
Hermansky H. (1990), Perceptual Linear Predictive (PLP) analysis of speech, The Journal of the Acoustical Society of America, 87(4): 1738–1752, doi: 10.1121/1.399423.
Huang X., Acero A., Hon H.-W. (2001), Spoken Language Processing : A Guide to Theory, Algorithm, and System Development, Prentice Hall PTR.
James A.P. (2015), Heart rate monitoring using human speech spectral features, Human-Centric Computing and Information Sciences, 5(1): 1–12, doi: 10.1186/s13673-015-0052-z.
Kabal P. (2017), Audio File Format Specifications, MMSP Lab, McGill University, http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/CSL/CSL.html.
Kaur J., Kaur R. (2014), Extraction of heart rate parameters using speech analysis, International Journal of Science and Research (IJSR), 3(10): 1374–1376.
Kutner M.H., Nachtsheim C., Neter J., Li W (2004), Applied Linear Statistical Models, 4th ed., Irwin: McGraw Hill.
Laskowski E.R (2018), Heart Rate: What’s Normal?, Mayo Clinic, https://www.mayoclinic.org/healthy-lifestyle/fitness/expert-answers/heart-rate/faq-20057979.
Lin L.I-K. (1989), A concordance correlation coefficient to evaluate reproducibility, Biometrics, 45(1): 255–268, doi: 10.2307/2532051.
Logan B. (2000), Mel frequency cepstral coefficients for music modeling, [In:] 1st International Symposium on Music Information Retrieval, http://ismir2000.ismir.net/papers/logan_paper.pdf.
Lyons J. (2012), Mel Frequency Cepstral Coefficient (MFCC) Tutorial, Practical Cryptography, http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/#computing-the-mel-filter-bank.
MacGill M. (2017), Heart rate: what is a normal heart rate?, Medical News Today, https://www.medicalnewstoday.com/articles/235710.php.
Magre S., Deshmukh R.R., Shrishrimal P.P. (2013), A Comparative Study on Feature Extraction Techniques in Speech Recognition, [In:] International Conference on Recent Advances in Statistics and Their Applications, Aurangabad, https://www.researchgate.net/publication/278549945_A_Comparative_Study_on_Feature_Extraction_Techniques_in_Speech_Recognition.
Merhav N., Lee C.-H. (1993), On the asymptotic statistical behavior of empirical cepstral coefficients, IEEE Transactions on Signal Processing, 41(5): 1990–1993, doi: 10.1109/78.215323.
Mesleh A., Skopin D., Baglikov S., Quteishat A. (2012), Heart rate extraction from vowel speech signals, Journal of Computer Science and Technology, 27(6): 1243–1251, doi: 10.1007/s11390-012-1300-6.
Nasrabadi N.M. (2007), Pattern recognition and machine learning, Journal of Electronic Imaging, 16(4): 049901, doi: 10.1117/1.2819119.
Oppenheim A.V., Verghese G.C. (2015), Signals, Systems & Inference, Pearson.
Orlikoff R.F, Baken R.J. (1989), The effect of the heartbeat on vocal fundamental frequency perturbation, Journal of Speech and Hearing Research, 32(3): 576–582, http://www.ncbi.nlm.nih.gov/pubmed/2779201.
Partila P., Voznak M., Mikulec M., Zdralek J. (2012), Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state, Advances in Electrical and Electronic Engineering, 10(4): 270–275, doi: 10.15598/aeee.v10i4.738.
Poh M.-Z., McDuff D.J., Picard R.W. (2011), Advancements in noncontact, multiparameter physiological measurements using a webcam, IEEE Transactions on Biomedical Engineering, 58(1): 7–11, doi: 10.1109/TBME.2010.2086456.
Ramig L.A. (1983), Effects of physiological aging on vowel spectral noise, Journal of Gerontology, 38(2): 223–225.
Reilly K.J, Moore C.A. (2003), Respiratory sinus arrhythmia during speech production, Journal of Speech, Language, and Hearing Research : JSLHR, 46(1): 164–177, http://www.ncbi.nlm.nih.gov/pubmed/12647896.
Reynolds A., Paivio A. (1968), Cognitive and emotional determinants of speech, Canadian Journal of Psychology/Revue Canadienne de Psychologie, 22(3): 164–175.
Roychowdhury S., Bihis M. (2016), AG-MIC: Azure-based generalized flow for medical image classification, IEEE Access, 4: 5243–5257, doi: 10.1109/ACCESS.2016.2605641.
Sakai M. (2015a), Feasibility study on blood pressure estimations from voice spectrum analysis, International Journal of Computer Applications, 109 (7): 39–43, doi: 10.5120/19204-0848.
Sakai M. (2015a), Modeling the relationship between heart rate and features of vocal frequency, International Journal of Computer Applications, 120(6): 32–37, doi: 10.5120/21233-3986.
Schuller B., Friedmann F., Eyben F. (2013), Automatic recognition of physiological parameters in the human voice: heart rate and skin conductance, [In:] 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7219–7223, doi: 10.1109/ICASSP.2013.6639064.
Schuller B., Friedmann F., Eyben F. (2014), The Munich Biovoice Corpus: effects of physical exercising, heart rate, and skin conductance on human speech production, [In:] Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), pp. 1506–1510. Reykjavik: European Language Resources Association (ELRA), http://www.lrec-conf.org/proceedings/lrec2014/pdf/611_Paper.pdf.
ScienceEncyclopedia (2019), Speech – the physiology of speech – air, vocal, words, and sound, JRank Articles, Science Encyclopedia, https://science.jrank.org/pages/6371/Speech-physiology-speech.html.
Scully C.G. et al. (2012), Physiological parameter monitoring from optical recordings with a mobile phone, IEEE Transactions on Biomedical Engineering, 59(2): 303–306, doi: 10.1109/TBME.2011.2163157.
Skopin D.E., Baglikov S.U. (2009), Heartbeat feature extraction from vowel speech signal using 2D spectrum representation, [In:] 4th International Conference on Information Technology (ICIT), Amman, Jordan, https://www.zuj.edu.jo/conferences/ICIT09/PaperList/Papers/Image and Signal Processing/450Demitry.pdf.
Tan Z.-H., Lindberg B. (2010), Low-complexity variable frame rate analysis for speech recognition and voice activity detection, IEEE Journal of Selected Topics in Signal Processing, 4(5): 798–807, doi: 10.1109/JSTSP.2010.2057192.
Trouvain J., Truong K.P. (2015), Prosodic characteristics of read speech before and after treadmill running, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany (ISCA), https://research.utwente.nl/en/publications/prosodic-characteristics-of-read-speech-before-and-after-treadmil.
Tufekci Z., Gowdy J.N. (2000), Feature extraction using discrete wavelet transform for speech recognition, [In:] Proceedings of the IEEE SoutheastCon 200), "Preparing for The New Millennium”, pp. 116–123, doi: 10.1109/SECON.2000.845444.
Usman M. (2017), On the performance degradation of speaker recognition system due to variation in speech characteristics caused by physiological changes, International Journal of Computing and Digital Systems, 6(3): 119–127, doi: 10.12785/IJCDS/060303.
Usman M., Zubair M., Shiblee M., Rodrigues P., Jaffar S. (2018), Probabilistic modeling of speech in spectral domain using maximum likelihood estimation, Symmetry, 10(12): 750, doi: 10.3390/sym10120750.
Wolf J.J. (1980), Speech signal processing and feature extraction, [In:] Spoken Language Generation and Understanding, pp. 103–128, Dordrecht: Springer Netherlands, doi: 10.1007/978-94-009-9091-3_6.
Yasuma F., Hayano J.-I. (2004), Respiratory sinus arrhythmia: why does the heartbeat synchronize with respiratory rhythm?, Chest, 125(2): 683–690, http://www.ncbi.nlm.nih.gov/pubmed/14769752.
Zhang G., Patuwo B.E., Hu M.Y. (1998), Forecasting with artificial neural networks: the state of the art, International Journal of Forecasting, 14(1): 35–62, doi: 10.1016/S0169-2070(97)00044-7.
DOI: 10.24425/aoa.2021.136559