Speaker and prosodic peculiarity classification in emotional speech

Authors

  • Neda Mousavi Author
  • Sven Grawunder Author

DOI:

https://doi.org/10.36505/ExLing-2024/15/0023/000648

Keywords:

rhythm, speaker classification, between-speaker variation, prosodic peculiarity, emotional speech

Abstract

In this study, the relationship between rhythmic metrics, emotion recognition, and speaker variability is investigated using the German emotional speech corpus (VMEmo). Using principal component analysis and linear discriminant, the results show accuracies close to 0.40 when rhythmic features from different acoustic domains of time, intensity, and frequency are merged to identify linguistic behavior. However, the fluctuating accuracies of 0.44 to 0.17 in classifying speakers based on specific rhythmic feature categories emphasize the significant differences within these feature subgroups. These variations suggest possible nuances or complexities that require deeper exploration and thorough investigation to better understand the differences between these features and their impact on speaker classification accuracy.

References

Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., Fischer, K. 2000. The Recognition of Emotion. In: Wahlster, W. (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation. Artificial Intelligence. Springer, Berlin, Heidelberg.

Dellwo, V. 2006. Rhythm and speech rate: a variation coefficient for deltaC. In Karnowski P. & Szigeti, I. (ed.) Language and language processing. Frankfurt am Main: Peter Lang, 231-241.

Grabe, E., Low, E.L. 2002. Acoustic correlates of rhythm class. In: Gussenhoven, Warner (Eds.), Laboratory Phonology, vol. 7. Berlin: Mouton de Gruyter: 515–546.

He, L., Dellwo, V. 2016. The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language & the Law, 23(2), 243-273.

Kisler, T., Reichel, U.D., Schiel, F. 2017. Multilingual processing of speech via web services. Computer Speech & Language, 45, 326–347.

Lykartsis, A. 2020. On the analysis of speech rhythm for language and speaker identification. PhD dissertation, Technische Universität Berlin.

Mefiah, A., Alotaibi Y.A., Selouani S.A. 2015. Arabic speaker emotion classification using rhythm metrics and neural networks. In 2015 23rd European Signal Processing Conference (EUSIPCO), 1426–1430. IEEE.

Mousavi, N., Grawunder, S. 2023. Persian speaker classification using rhythmic features. In Draxler, C., editor, Studientexte zur Sprachkommunikation: Elektronische Sprachsig-nalverarbeitung 2023, pages 194–201. TUDpress, Dresden.

Ramus, Fr., Nespor, M., Mehler, J. 1999. Correlates of Linguistic Rhythm in the Speech Signal. Cognition, 73, 265-292.

Tilsen, S., Arvaniti, A. 2013. Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. JASA, 134(1), 628–639.

Downloads

Published

01-01-2024

How to Cite

Speaker and prosodic peculiarity classification in emotional speech. (2024). Linguistic Proceedings Series, 15, 89-92. https://doi.org/10.36505/ExLing-2024/15/0023/000648

Share

Similar Articles

1-10 of 249

You may also start an advanced similarity search for this article.