Temporal dynamics of acoustic emotion encoding

Yuxin Fan; Yufeng Wu

doi:10.36505/TheLinguisticProceedings/2025/16/01/005/000665

Authors

Yuxin Fan Southeast University, China Author
Yufeng Wu City University of Hong Kong, Hong Kong Author

DOI:

https://doi.org/10.36505/TheLinguisticProceedings/2025/16/01/005/000665

Keywords:

speech emotion recognition (SER), affective computing, acoustic features

Abstract

Static analyses of speech emotion often overlook temporal dependencies. This study examines how the Valence, Arousal, and Dominance (VAD) of a preceding utterance moderate the relationship between acoustic features and the VAD of the subsequent utterance. Linear mixed-effects models were fitted to 5,221 utterances from the IEMOCAP corpus. The results showed that lagged VAD was the strongest predictor across all dimensions, demonstrating significant emotional inertia. In addition, the relationship between acoustic parameters and subsequent VAD was significantly moderated by lagged VAD. These findings confirm that acoustic-emotion associations are dynamic and context-dependent, challenging static models and highlighting the importance of incorporating temporal dynamics into emotion recognition systems.

References

Fontaine, J. R. J., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C. (2007). The world of emotions is not two-dimensional. Psychological Science, 18, 1050–1057.

Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., & Narayanan, S. S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42, 335–359.

Martijn, G., & Klaus, S. (2010). Beyond arousal: Valence and potency/control cues in the vocal expression of emotion. Journal of the Acoustical Society of America, 128(3), 1322–1336.

Schuller, B. W. (2012). The computational paralinguistics challenge. IEEE Signal Processing Magazine, 29, 97–101.

Eyben, F., et al. (2016). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7, 190–202.

Jadoul, Y., Thompson, B., & de Boer, B. (2018). Introducing Parselmouth: A Python interface to Praat. Journal of Phonetics, 71, 1–15.

Temporal dynamics of acoustic emotion encoding

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Share

Similar Articles

Keywords

Browse Articles