Filled pauses and lengthenings detection using machine learning techniques
DOI:
https://doi.org/10.36505/ExLing-2016/07/0042/000301Keywords:
speech disfluencies, filled pauses, spontaneous speech processing, Russian, ELMAbstract
This paper addresses the issue of filled pauses and lengthenings detection and classification in Russian using machine learning techniques, such as ELM. We use such parameters as formants and energy variation and MFCC coefficients. The experiments on FPs detection and classification, that are carried out on the joint material of SPIIRAS task-based dialogs corpus, Russian casual conversations from Binghamton Open Source MultiLanguage Audio Database, reports from the appendix No5 to the phonetic journal “Bulletin of the Phonetic Fund” belonging to the Department of Phonetics of Saint Petersburg University and small part of SWITCHBOARD corpus. For evaluation of the experiments results we calculate the F1 score. The best achieved F1 score was 0.42.
References
Akusok, A., Bjork, K. M., Miche, Y., Lendasse, A. 2015. High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access, 3, 1011-1025.
ComParE INTERSPEECH: Computational Paralinguistic Challenge, 2013. http://emotion-research.net/sigs/speech-sig/is13-compare
Department of Phonetics of Saint Petersburg University. http://phonetics.spbu.ru/
Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A. 2014. Application of Image Processing Methods to Filled Pauses Detection from Spontaneous Speech. In Proc. of INTERSPEECH 2014, 1816-1820, Singapore.
Eyben, F., Wollmer, M., Schuller, B. 2010. OpenSMILE: the Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proc. 18th ACM International conference on Multimedia, 1459-1462.
O'Connell, D., Kowal, S. 2004. The History of Research on the Filled Pause as Evidence of the Written Language Bias in Linguistics. Journal of Psycholinguistic Research, vol. 33(6), 459-474.
Kibrik, A., Podlesskaya, V. (eds.). 2014. Rasskazy o Snovideniyah: Korpusnoye Issledovaniye Ustnogo Russkogo Diskursa [Night dream stories: Corpus study of Russian discourse]. Litres.
Godfrey, J. J., Holliman, E. C., McDaniel, J. 1992. SWITCHBOARD: Telephone Speech Corpus for Research and Development. In Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP-92), vol. 1, 517-520.
Verkhodanova, V., Shapranov, V. 2014. Automatic Detection of Filled Pauses and Lengthenings in the Spontaneous Russian Speech. In Proc. 7th International Conference Speech Prosody, 1110-1114, Dublin, Ireland.
Zahorian, S. A., Wu, J., Karnjanadecha, M., Vootkur, C. S., Wong, B., Hwang, A., Tokhtamyshev, E. 2011. Open-Source Multi-Language Audio Database for Spoken Language Processing Applications. In Proc. INTERSPEECH 2011, pp. 1493-1496, Florence, Italy.
Boersma, P., Weenink, D. 2016. Praat: doing phonetics by computer [Computer program]. Version 6.0.11, retrieved 20 January 2016 from http://www.praat.org/
Downloads
Published
Issue
Section
License
Articles are published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.