Speaker and prosodic peculiarity classification in emotional speech
DOI:
https://doi.org/10.36505/ExLing-2024/15/0023/000648Keywords:
rhythm, speaker classification, between-speaker variation, prosodic peculiarity, emotional speechAbstract
In this study, the relationship between rhythmic metrics, emotion recognition, and speaker variability is investigated using the German emotional speech corpus (VMEmo). Using principal component analysis and linear discriminant, the results show accuracies close to 0.40 when rhythmic features from different acoustic domains of time, intensity, and frequency are merged to identify linguistic behavior. However, the fluctuating accuracies of 0.44 to 0.17 in classifying speakers based on specific rhythmic feature categories emphasize the significant differences within these feature subgroups. These variations suggest possible nuances or complexities that require deeper exploration and thorough investigation to better understand the differences between these features and their impact on speaker classification accuracy.
References
Batliner, A., Huber, R., Niemann, H., Nöth, E., Spilker, J., Fischer, K. 2000. The Recognition of Emotion. In: Wahlster, W. (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation. Artificial Intelligence. Springer, Berlin, Heidelberg.
Dellwo, V. 2006. Rhythm and speech rate: a variation coefficient for deltaC. In Karnowski P. & Szigeti, I. (ed.) Language and language processing. Frankfurt am Main: Peter Lang, 231-241.
Grabe, E., Low, E.L. 2002. Acoustic correlates of rhythm class. In: Gussenhoven, Warner (Eds.), Laboratory Phonology, vol. 7. Berlin: Mouton de Gruyter: 515–546.
He, L., Dellwo, V. 2016. The role of syllable intensity in between-speaker rhythmic variability. International Journal of Speech, Language & the Law, 23(2), 243-273.
Kisler, T., Reichel, U.D., Schiel, F. 2017. Multilingual processing of speech via web services. Computer Speech & Language, 45, 326–347.
Lykartsis, A. 2020. On the analysis of speech rhythm for language and speaker identification. PhD dissertation, Technische Universität Berlin.
Mefiah, A., Alotaibi Y.A., Selouani S.A. 2015. Arabic speaker emotion classification using rhythm metrics and neural networks. In 2015 23rd European Signal Processing Conference (EUSIPCO), 1426–1430. IEEE.
Mousavi, N., Grawunder, S. 2023. Persian speaker classification using rhythmic features. In Draxler, C., editor, Studientexte zur Sprachkommunikation: Elektronische Sprachsig-nalverarbeitung 2023, pages 194–201. TUDpress, Dresden.
Ramus, Fr., Nespor, M., Mehler, J. 1999. Correlates of Linguistic Rhythm in the Speech Signal. Cognition, 73, 265-292.
Tilsen, S., Arvaniti, A. 2013. Speech rhythm analysis with decomposition of the amplitude envelope: characterizing rhythmic patterns within and across languages. JASA, 134(1), 628–639.
Downloads
Published
Issue
Section
License
Articles are published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.