Speaker and prosodic peculiarity classification in emotional speech

Neda Mousavi; Sven Grawunder

doi:10.36505/ExLing-2024/15/0023/000648

Authors

Neda Mousavi Author
Sven Grawunder Author

DOI:

https://doi.org/10.36505/ExLing-2024/15/0023/000648

Keywords:

rhythm, speaker classification, between-speaker variation, prosodic peculiarity, emotional speech

Abstract

In this study, the relationship between rhythmic metrics, emotion recognition, and speaker variability is investigated using the German emotional speech corpus (VMEmo). Using principal component analysis and linear discriminant, the results show accuracies close to 0.40 when rhythmic features from different acoustic domains of time, intensity, and frequency are merged to identify linguistic behavior. However, the fluctuating accuracies of 0.44 to 0.17 in classifying speakers based on specific rhythmic feature categories emphasize the significant differences within these feature subgroups. These variations suggest possible nuances or complexities that require deeper exploration and thorough investigation to better understand the differences between these features and their impact on speaker classification accuracy.

References

Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., Fischer, K. 2000. The Recognition of Emotion. In: Wahlster, W. (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation. Artificial Intelligence. Springer, Berlin, Heidelberg.

Dellwo, V. 2006. Rhythm and speech rate: a variation coefficient for deltaC. In Karnowski P. & Szigeti, I. (ed.) Language and language processing. Frankfurt am Main: Peter Lang, 231-241.

Grabe, E., Low, E.L. 2002. Acoustic correlates of rhythm class. In: Gussenhoven, Warner (Eds.), Laboratory Phonology, vol. 7. Berlin: Mouton de Gruyter: 515–546.

He, L., Dellwo, V. 2016. The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language & the Law, 23(2), 243-273.

Kisler, T., Reichel, U.D., Schiel, F. 2017. Multilingual processing of speech via web services. Computer Speech & Language, 45, 326–347.

Lykartsis, A. 2020. On the analysis of speech rhythm for language and speaker identification. PhD dissertation, Technische Universität Berlin.

Mefiah, A., Alotaibi Y.A., Selouani S.A. 2015. Arabic speaker emotion classification using rhythm metrics and neural networks. In 2015 23rd European Signal Processing Conference (EUSIPCO), 1426–1430. IEEE.

Mousavi, N., Grawunder, S. 2023. Persian speaker classification using rhythmic features. In Draxler, C., editor, Studientexte zur Sprachkommunikation: Elektronische Sprachsig-nalverarbeitung 2023, pages 194–201. TUDpress, Dresden.

Ramus, Fr., Nespor, M., Mehler, J. 1999. Correlates of Linguistic Rhythm in the Speech Signal. Cognition, 73, 265-292.

Tilsen, S., Arvaniti, A. 2013. Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. JASA, 134(1), 628–639.

Speaker and prosodic peculiarity classification in emotional speech

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Share

Similar Articles

Keywords

Browse Articles