Temporal dynamics of acoustic emotion encoding

Authors

  • Yuxin Fan Southeast University, China Author
  • Yufeng Wu City University of Hong Kong, Hong Kong Author

DOI:

https://doi.org/10.36505/TheLinguisticProceedings/2025/16/01/005/000665

Keywords:

speech emotion recognition (SER), affective computing, acoustic features

Abstract

Static analyses of speech emotion often overlook temporal dependencies. This study examines how the Valence, Arousal, and Dominance (VAD) of a preceding utterance moderate the relationship between acoustic features and the VAD of the subsequent utterance. Linear mixed-effects models were fitted to 5,221 utterances from the IEMOCAP corpus. The results showed that lagged VAD was the strongest predictor across all dimensions, demonstrating significant emotional inertia. In addition, the relationship between acoustic parameters and subsequent VAD was significantly moderated by lagged VAD. These findings confirm that acoustic-emotion associations are dynamic and context-dependent, challenging static models and highlighting the importance of incorporating temporal dynamics into emotion recognition systems.

References

Fontaine, J. R. J., Scherer, K. R., Roesch, E. B., & Ellsworth, P. C. (2007). The world of emotions is not two-dimensional. Psychological Science, 18, 1050–1057.

Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J. N., Lee, S., & Narayanan, S. S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42, 335–359.

Martijn, G., & Klaus, S. (2010). Beyond arousal: Valence and potency/control cues in the vocal expression of emotion. Journal of the Acoustical Society of America, 128(3), 1322–1336.

Schuller, B. W. (2012). The computational paralinguistics challenge. IEEE Signal Processing Magazine, 29, 97–101.

Eyben, F., et al. (2016). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7, 190–202.

Jadoul, Y., Thompson, B., & de Boer, B. (2018). Introducing Parselmouth: A Python interface to Praat. Journal of Phonetics, 71, 1–15.

Downloads

Published

01-09-2025

How to Cite

Temporal dynamics of acoustic emotion encoding. (2025). Linguistic Proceedings Series, 16(1), 17-20. https://doi.org/10.36505/TheLinguisticProceedings/2025/16/01/005/000665

Share

Similar Articles

51-60 of 294

You may also start an advanced similarity search for this article.