Filled pauses and lengthenings detection using machine learning techniques

Vasilisa Verkhodanova; Vladimir Shapranov; Alexey Karpov

doi:10.36505/ExLing-2016/07/0042/000301

Authors

Vasilisa Verkhodanova SPIIRAS, Russian Academy of Sciences, Russia Author
Vladimir Shapranov SPIIRAS, Russian Academy of Sciences, Russia Author
Alexey Karpov SPIIRAS, Russian Academy of Sciences, Russia Author

DOI:

https://doi.org/10.36505/ExLing-2016/07/0042/000301

Keywords:

speech disfluencies, filled pauses, spontaneous speech processing, Russian, ELM

Abstract

This paper addresses the issue of filled pauses and lengthenings detection and classification in Russian using machine learning techniques, such as ELM. We use such parameters as formants and energy variation and MFCC coefficients. The experiments on FPs detection and classification, that are carried out on the joint material of SPIIRAS task-based dialogs corpus, Russian casual conversations from Binghamton Open Source MultiLanguage Audio Database, reports from the appendix No5 to the phonetic journal “Bulletin of the Phonetic Fund” belonging to the Department of Phonetics of Saint Petersburg University and small part of SWITCHBOARD corpus. For evaluation of the experiments results we calculate the F1 score. The best achieved F1 score was 0.42.

References

Akusok, A., Bjork, K. M., Miche, Y., Lendasse, A. 2015. High-performance extreme learning machines: a complete toolbox for big data applications. IEEE Access, 3, 1011-1025.

ComParE INTERSPEECH: Computational Paralinguistic Challenge, 2013. http://emotion-research.net/sigs/speech-sig/is13-compare

Department of Phonetics of Saint Petersburg University. http://phonetics.spbu.ru/

Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A. 2014. Application of Image Processing Methods to Filled Pauses Detection from Spontaneous Speech. In Proc. of INTERSPEECH 2014, 1816-1820, Singapore.

Eyben, F., Wollmer, M., Schuller, B. 2010. OpenSMILE: the Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proc. 18th ACM International conference on Multimedia, 1459-1462.

O'Connell, D., Kowal, S. 2004. The History of Research on the Filled Pause as Evidence of the Written Language Bias in Linguistics. Journal of Psycholinguistic Research, vol. 33(6), 459-474.

Kibrik, A., Podlesskaya, V. (eds.). 2014. Rasskazy o Snovideniyah: Korpusnoye Issledovaniye Ustnogo Russkogo Diskursa [Night dream stories: Corpus study of Russian discourse]. Litres.

Godfrey, J. J., Holliman, E. C., McDaniel, J. 1992. SWITCHBOARD: Telephone Speech Corpus for Research and Development. In Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP-92), vol. 1, 517-520.

Verkhodanova, V., Shapranov, V. 2014. Automatic Detection of Filled Pauses and Lengthenings in the Spontaneous Russian Speech. In Proc. 7th International Conference Speech Prosody, 1110-1114, Dublin, Ireland.

Zahorian, S. A., Wu, J., Karnjanadecha, M., Vootkur, C. S., Wong, B., Hwang, A., Tokhtamyshev, E. 2011. Open-Source Multi-Language Audio Database for Spoken Language Processing Applications. In Proc. INTERSPEECH 2011, pp. 1493-1496, Florence, Italy.

Boersma, P., Weenink, D. 2016. Praat: doing phonetics by computer [Computer program]. Version 6.0.11, retrieved 20 January 2016 from http://www.praat.org/

Filled pauses and lengthenings detection using machine learning techniques

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Keywords

Browse Articles

Share