Speech rate perception and interlocutor identification in human-directed vs. device-directed speech

Yahya Aldholmi; May Al-Sager; Arwa Alsahafi; Reema Alshiddi

doi:10.36505/TheLinguisticProceedings/2025/16/01/001/000661

Authors

Yahya Aldholmi King Saud University, Saudi Arabia Author
May Al-Sager King Saud University, Saudi Arabia Author
Arwa Alsahafi King Saud University, Saudi Arabia Author
Reema Alshiddi King Saud University, Saudi Arabia Author

DOI:

https://doi.org/10.36505/TheLinguisticProceedings/2025/16/01/001/000661

Keywords:

human-directed speech, speech perception, device-directed speech, speech rate, interlocutor identification

Abstract

This study investigates how listeners perceive differences between human-directed and device-directed speech, focusing on speech rate and interlocutor identification. Seventy-eight native Arabic speakers (aged 19–22; M = 20.46, SD = 1.11) participated in two tasks: rating the speed of 30 short recordings and determining whether each sample was directed towards a person or a device. The results showed that device-directed speech was consistently perceived as faster, while human-directed speech enabled more accurate interlocutor identification. Statistical analyses confirmed that these differences were significant, with moderate effect sizes. The findings suggest that devices produce speech efficiently but lack the natural variability that characterises human communication. Incorporating more dynamic and expressive features into voice systems could improve user engagement. Future research should consider cultural differences and emotional tone in shaping speech perception.

References

Aldholmi, Y., Aldhafyan, R., & Alqahtani, A. (2021). Perception of Standard Arabic synthetic speech rate. Interspeech 2021, 1704–1707. https://doi.org/10.21437/Interspeech.2021-39

Huiyang, S., & Min, W. (2022). Improving interaction experience through lexical convergence: The prosocial effect of lexical alignment in human-human and human-computer interactions. International Journal of Human-Computer Interaction, 38(1), 28–41. https://doi.org/10.1080/10447318.2021.1921367

Jones, C., Berry, L., & Stevens, C. (2007). Synthesized speech intelligibility and persuasion: Speech rate and non-native listeners. Computer Speech & Language, 21(4), 641–651. https://doi.org/10.1016/j.csl.2007.03.001

Vonessen, J., Aoki, N. B., Cohn, M., & Zellou, G. (2024). Comparing perception of L1 and L2 English by human listeners and machines: Effect of interlocutor adaptations. Journal of the Acoustical Society of America, 155(5), 3060–3070. https://doi.org/10.1121/10.0025930

Zellou, G., Cohn, M., & Pycha, A. (2023). Listener beliefs and perceptual learning: Differences between device and human guises. Language, 99(4), 692–725. https://doi.org/10.1353/lan.2023.a914191

Speech rate perception and interlocutor identification in human-directed vs. device-directed speech

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Share

Similar Articles

Keywords

Browse Articles